AI with Papers - Artificial Intelligence & Deep Learning
15.2K subscribers
135 photos
247 videos
14 files
1.31K links
All the AI with papers. Every day fresh updates about #DeepLearning, #MachineLearning, LLMs and #ComputerVision

Curated by Alessandro Ferrari | https://www.linkedin.com/in/visionarynet/

#artificialintelligence #machinelearning #ml #AI
Download Telegram
This media is not supported in your browser
VIEW IN TELEGRAM
ðŸŒđ Generalizable Neural Performer ðŸŒđ

👉General neural framework to synthesize free-viewpoint images of arbitrary human performers

𝐇ðĒð ðĄðĨðĒð ðĄð­ðŽ:
✅Free-viewpoint synthesis of humans
✅Implicit Geometric Body Embedding
✅Screen-Space Occlusion-Aware Blending
✅GeneBody: 4M frames, multi-view cams

More: https://cutt.ly/SGcnQzn
👍5ðŸ”Ĩ1ðŸĪŊ1
This media is not supported in your browser
VIEW IN TELEGRAM
🚌 Tire-defect inspection 🚌

👉Unsupervised defects in tires using neural networks

𝐇ðĒð ðĄðĨðĒð ðĄð­ðŽ:
✅Impurity, same material as tire
✅Impurity, with different material
✅Damage by temp/pressure
✅Crack or etched material

More: https://bit.ly/37GX1JT
âĪ5👍3ðŸĪĐ1
This media is not supported in your browser
VIEW IN TELEGRAM
🧋#4D Neural Fields🧋

👉4D N.F. visual representations from monocular RGB-D ðŸĪŊ

𝐇ðĒð ðĄðĨðĒð ðĄð­ðŽ:
✅4D scene completion (occlusions)
✅Scene completion in cluttered scenes
✅Novel #AI for contextual point clouds
✅Data, code, models under MIT license

More: https://cutt.ly/6GveKiJ
👍6ðŸĪŊ2ðŸ”Ĩ1ðŸĨ°1
This media is not supported in your browser
VIEW IN TELEGRAM
👔Largest dataset of human-object 👔

👉BEHAVE by Google: largest dataset of human-object interactions

𝐇ðĒð ðĄðĨðĒð ðĄð­ðŽ:
✅8 subjects, 20 objects, 5 envs.
✅321 clips with 4 Kinect RGB-D
✅Masks and segmented point clouds
✅3D SMPL & mesh registration
✅Textured scan reconstructions

More: https://bit.ly/3Lx6NNo
👏5👍4ðŸ”Ĩ2âĪ1ðŸ˜ą1ðŸĪĐ1
This media is not supported in your browser
VIEW IN TELEGRAM
ðŸĶīENARF-GAN Neural ArticulationsðŸĶī

👉Unsupervised method for 3D geometry-aware representation of articulated objects

𝐇ðĒð ðĄðĨðĒð ðĄð­ðŽ:
✅Novel efficient neural representation
✅Tri-planes deformation fields for training
✅Novel GAN for articulated representations
✅Controllable 3D from real unlabeled pic

More: https://bit.ly/3xYqedN
ðŸĪŊ3👍2âĪ1ðŸ”Ĩ1ðŸĨ°1
This media is not supported in your browser
VIEW IN TELEGRAM
ðŸ–ēïļ HuMMan: 4D human dataset ðŸ–ēïļ

👉HuMMan: 4D dataset with 1000 humans, 400k sequences & 60M frames ðŸĪŊ

𝐇ðĒð ðĄðĨðĒð ðĄð­ðŽ:
✅RGB, pt-clouds, keypts, SMPL, texture
✅Mobile device in the sensor suite
✅500+ actions to cover movements

More: https://bit.ly/3vTRW8Z
ðŸĨ°2ðŸ˜ą2👍1ðŸĪŊ1
This media is not supported in your browser
VIEW IN TELEGRAM
ðŸ”ĨNeighborhood Attention Transformer ðŸ”Ĩ

👉A novel transformer for both image classification and downstream vision tasks

𝐇ðĒð ðĄðĨðĒð ðĄð­ðŽ:
✅Neighborhood Attention (NA)
✅Neighborhood Attention Transformer, NAT
✅Faster training/inference, good throughput
✅Checkpoints, train, #CUDA kernel available

More: https://bit.ly/3F5aVSo
ðŸĪŊ4👍3ðŸ”Ĩ1ðŸ˜ą1
This media is not supported in your browser
VIEW IN TELEGRAM
ðŸ”ĨðŸ”ĨFANs: Fully Attentional NetworksðŸ”ĨðŸ”Ĩ

👉#Nvidia unveils the fully attentional networks (FANs)

𝐇ðĒð ðĄðĨðĒð ðĄð­ðŽ:
✅Efficient fully attentional design
✅Semantic seg. & object detection
✅Model/source code soon available!

More: https://bit.ly/3vtpITs
ðŸ”Ĩ7ðŸĪŊ3👍2âĪ1
ðŸ‘Ļ🏞‍ðŸŽĻ Open-Source DALL·E 2 is out ðŸ‘Ļ🏞‍ðŸŽĻ

👉#Pytorch implementation of DALL-E 2, #OpenAI's latest text-to-image neural net.

𝐇ðĒð ðĄðĨðĒð ðĄð­ðŽ:
✅SOTA for text-to-image generation
✅Source code/model under MIT License
✅"Medieval painting of wifi not working"

More: https://bit.ly/3vzsff6
ðŸĪŊ14👍6😁1
This media is not supported in your browser
VIEW IN TELEGRAM
⛹ViTPose: Transformer for Pose⛹

👉ViTPose from ViTAE, ViT for human pose

𝐇ðĒð ðĄðĨðĒð ðĄð­ðŽ:
✅Plain/nonhierarchical ViT for pose
✅Deconv-layers after ViT for keypoints
✅Just the baseline is the new SOTA
✅Source code & models available soon!

More: https://bit.ly/3MJ0kz1
👍5ðŸĪŊ4ðŸ”Ĩ1ðŸĨ°1
This media is not supported in your browser
VIEW IN TELEGRAM
ðŸ§ģ Unsupervised HD Motion Transfer ðŸ§ģ

👉Novel e2e unsupervised motion transfer for image animation

𝐇ðĒð ðĄðĨðĒð ðĄð­ðŽ:
✅TPS motion estimation + Dropout
✅Novel E2E unsupervised motion transfer
✅Optical flow + multi-res. occlusion mask
✅Code and models under MIT license

More: https://bit.ly/3MGNPns
ðŸ”Ĩ8👍6ðŸĪŊ4âĪ2ðŸ˜ą2
This media is not supported in your browser
VIEW IN TELEGRAM
ðŸšĪ Neural Self-Calibration in the wild ðŸšĪ

👉 Learning algorithm to regress calibration params from in the wild clips

𝐇ðĒð ðĄðĨðĒð ðĄð­ðŽ:
✅Params purely from self-supervision
✅S.S. depth/pose learning as objective
✅POV, fisheye, catadioptric: no changes
✅SOTA results on EuRoC MAV dataset

More: https://bit.ly/3w1n6LB
👍8ðŸĪĐ2ðŸ”Ĩ1ðŸĨ°1ðŸĪŊ1
This media is not supported in your browser
VIEW IN TELEGRAM
ðŸĶ… ConDor: S.S. Canonicalization ðŸĶ…

👉Self-Supervised Canonicalization for full/partial 3D points cloud

𝐇ðĒð ðĄðĨðĒð ðĄð­ðŽ:
✅RRC + Stanford + KAIST + Brown
✅On top of Tensor Field Networks (TFNs)
✅Unseen 3D -> equivariant canonical
✅Co-segmentation, NO supervision
✅Code and model under MIT license

More: https://bit.ly/3MNDyGa
ðŸ”Ĩ4👍1ðŸĪĐ1
This media is not supported in your browser
VIEW IN TELEGRAM
ðŸĶ€ Event-aided Direct Sparse Odometry ðŸĶ€

👉EDS: direct monocular visual odometry using events/frames

𝐇ðĒð ðĄðĨðĒð ðĄð­ðŽ:
✅Mono 6-DOF visual odometry + events
✅Direct photometric bundle adjustment
✅Camera motion tracking by sparse pixels
✅A new dataset with HQ events and frame

More: https://bit.ly/3s9FiBN
ðŸ”Ĩ5👍3ðŸĪŊ1ðŸ˜ą1
This media is not supported in your browser
VIEW IN TELEGRAM
ðŸŦ€BlobGAN: Blob-Disentangled SceneðŸŦ€

👉Unsupervised, mid-level (blobs) generation of scenes

𝐇ðĒð ðĄðĨðĒð ðĄð­ðŽ:
✅Spatial, depth-ordered Gaussian blobs
✅Reaching for supervised level, and more
✅Source under BSD-2 "Simplified" License

More: https://bit.ly/3kRyGnj
ðŸ”Ĩ8👍1ðŸĨ°1ðŸĪŊ1ðŸ˜ą1
This media is not supported in your browser
VIEW IN TELEGRAM
ðŸĶ•E2EVE editor via pre-trained artistðŸĶ•

👉E2EVE generates a new version of the source image that resembles the "driver" one

𝐇ðĒð ðĄðĨðĒð ðĄð­ðŽ:
✅Blending regions by driver image
✅E2E cond-probability of the edits
✅S.S. augmenting in target domain
✅Implemented as SOTA transformer
✅Code/models available (soon)

More: https://bit.ly/3P9TDYW
ðŸĪŊ5👍2ðŸĪĐ2âĪ1ðŸ”Ĩ1
This media is not supported in your browser
VIEW IN TELEGRAM
ðŸķ Bringing pets in #metaverse ðŸķ

👉ARTEMIS: pipeline for generating articulated neural pets for virtual worlds

𝐇ðĒð ðĄðĨðĒð ðĄð­ðŽ:
✅ARTiculated, appEarance, Mo-synthesIS
✅Motion control, animation & rendering
✅Neural-generated (NGI) animal engine
✅SOTA animal mocap + neural control

More: https://bit.ly/3LZSLDU
âĪ4👍2ðŸĨ°2ðŸĪĐ1
This media is not supported in your browser
VIEW IN TELEGRAM
😍Animated hand in 1972, damn romantic😍

👉Q: is #VR the technology that developed least in the last 30 years? ðŸĪ”

More: https://bit.ly/3snxNaq
👍7âĪ3ðŸĪŊ1
This media is not supported in your browser
VIEW IN TELEGRAM
⏏ïļEnsembling models for GAN training⏏ïļ

👉Pretrained vision models to improve the GAN training. FID by 1.5 to 2×!

𝐇ðĒð ðĄðĨðĒð ðĄð­ðŽ:
✅CV models as ensemble of discriminators
✅Improving GAN in limited / large-scale set
✅10k samples matches StyleGAN2 w/ 1.6M
✅Source code / models under MIT license

More: https://bit.ly/3wgUVsr
ðŸĪŊ6ðŸ”Ĩ2
This media is not supported in your browser
VIEW IN TELEGRAM
ðŸĪŊCooperative Driving + AUTOCASTSIMðŸĪŊ

👉COOPERNAUT: cross-vehicle perception for vision-based cooperative driving

𝐇ðĒð ðĄðĨðĒð ðĄð­ðŽ:
✅UTexas + #Stanford + #Sony #AI
✅LiDAR into compact point-based
✅Network-augmented simulator
✅Source code and models available

More: https://bit.ly/3sr5HLk
ðŸ”Ĩ6ðŸĪŊ3ðŸĨ°1