AI with Papers - Artificial Intelligence & Deep Learning
15.2K subscribers
135 photos
247 videos
14 files
1.31K links
All the AI with papers. Every day fresh updates about #DeepLearning, #MachineLearning, LLMs and #ComputerVision

Curated by Alessandro Ferrari | https://www.linkedin.com/in/visionarynet/

#artificialintelligence #machinelearning #ml #AI
Download Telegram
This media is not supported in your browser
VIEW IN TELEGRAM
πŸ™Œ #Oculus' new Hand Tracking πŸ™Œ

πŸ‘‰Hands are able to move as naturally and intuitively in the #metaverse as do in real life

𝐇𝐒𝐠𝐑π₯𝐒𝐠𝐑𝐭𝐬:
βœ…Hands2.0 powered by CV & ML
βœ…Tracking hand-over-hand interactions
βœ…Crossing hands, clapping, high-fives
βœ…Accurate thumbs-up gesture

More: https://bit.ly/3JXPvY2
🀯6❀4πŸ‘2πŸ‘1
This media is not supported in your browser
VIEW IN TELEGRAM
πŸŽ—οΈNew SOTA in #3D human avatarπŸŽ—οΈ

πŸ‘‰PHORHUM: photorealistic 3D human from mono-RGB

𝐇𝐒𝐠𝐑π₯𝐒𝐠𝐑𝐭𝐬:
βœ…Pixel-aligned method for 3D geometry
βœ…Unshaded surface color + illumination
βœ…Patch-based rendering losses for visible
βœ…Plausible color estimation for non-visible

More: https://bit.ly/3MkvBrA
🀯4πŸ‘2πŸ₯°2❀1
This media is not supported in your browser
VIEW IN TELEGRAM
πŸ“Ÿ What's in your hands (#3D) ? πŸ“Ÿ

πŸ‘‰Reconstructing hand-held objects (from single RGB) without knowing their 3D templatesπŸ€·β€β™‚οΈ

𝐇𝐒𝐠𝐑π₯𝐒𝐠𝐑𝐭𝐬:
βœ…Hand is highly predictive of object shape
βœ…Conditional-based on the articulation
βœ…Visual feats. / articulation-aware coords.
βœ…Code and models available!

More: https://bit.ly/3vuYn2a
πŸ‘9🀯2πŸ₯°1
This media is not supported in your browser
VIEW IN TELEGRAM
πŸ”‹YODO: You Only Demonstrate OnceπŸ”‹

πŸ‘‰A novel category-level manipulation learned in sim from single demonstration video🀯

𝐇𝐒𝐠𝐑π₯𝐒𝐠𝐑𝐭𝐬:
βœ…One-shot IL, model-free 6D pose tracking
βœ…Demonstration BY single 3rd-person-view
βœ…manipulation including hi-precision tasks
βœ…Category-level Behavior Cloning
βœ…Attention for dynamic coords selection
βœ…Generalizability to novel unseen obj/env

More: https://bit.ly/3v0V4R4
🀯8❀3πŸ‘2😱2🀩2πŸ‘1
This media is not supported in your browser
VIEW IN TELEGRAM
πŸ‘— Dress Code for Virtual Try-On πŸ‘—

πŸ‘‰UniMORE (+ YOOX) unveils a novel dataset/approach for virtual try-on.

𝐇𝐒𝐠𝐑π₯𝐒𝐠𝐑𝐭𝐬:
βœ…Hi-Res paired front-view / full-body
βœ…Pixel-level Semantic-Aware Discriminator
βœ…9 SOTA VTON approaches / 3 baselines
βœ…New SOTA considering res. & garments

More: https://bit.ly/3xKXSUw
❀3πŸ‘3πŸ”₯1🀯1
This media is not supported in your browser
VIEW IN TELEGRAM
πŸƒDeep Equilibrium for Optical FlowπŸƒ

πŸ‘‰DEQ: converge faster, less memory, often more accurate

𝐇𝐒𝐠𝐑π₯𝐒𝐠𝐑𝐭𝐬:
βœ…Novel formulation of optical flow method
βœ…Compatible with prior modeling/data-related
βœ…Sparse fixed-point correction for stability
βœ…Code/models under GNU Affero GPL v3.0

More: https://bit.ly/3v4fZmi
πŸ‘3πŸ₯°2🀯1
This media is not supported in your browser
VIEW IN TELEGRAM
🌳Ultra High-Resolution Neural Saliency🌳

πŸ‘‰A novel ultra high-resolution saliency detector with dataset!

𝐇𝐒𝐠𝐑π₯𝐒𝐠𝐑𝐭𝐬:
βœ…Ultra Hi-Res Saliency Detection
βœ…5,920 pics at 4K-8K resolution
βœ…Pyramid Grafting Network
βœ…Cross-Model Grafting Module
βœ…AGL: Attention Guided Loss
βœ…Code/models under MIT

More: https://bit.ly/3MnU1Rf
❀6πŸ‘3🀯3πŸ”₯2🀩1
This media is not supported in your browser
VIEW IN TELEGRAM
πŸͺ†StyleGAN-Human for fashion πŸͺ†

πŸ‘‰A novel unconditional human generation based on StyleGAN is out!

𝐇𝐒𝐠𝐑π₯𝐒𝐠𝐑𝐭𝐬:
βœ…200,000+ labeled sample (pose/texture)
βœ…1024x512 StyleGAN-Human StyleGAN3
βœ…512x256 StyleGAN-Human StyleGAN1
βœ…Face model for downstream: InsetGAN
βœ…Source code and model available!

More: https://bit.ly/3xMg5B2
❀5πŸ‘4πŸ”₯3🀯1πŸ’©1
This media is not supported in your browser
VIEW IN TELEGRAM
πŸ’€ OSSO: Skeletal Shape from Outside πŸ’€

πŸ‘‰Anatomic skeleton of a person from 3D surface of body 🦴

𝐇𝐒𝐠𝐑π₯𝐒𝐠𝐑𝐭𝐬:
βœ…Max Planck + IMATI-CNR + INRIA
βœ…DXA images to obtain #3D shape
βœ…External body to internal skeleton

More: https://bit.ly/3v7Z5TQ
πŸ‘4🀯2πŸ”₯1😱1
This media is not supported in your browser
VIEW IN TELEGRAM
🎷 Pix2Seq: object detection by #Google 🎷

πŸ‘‰A novel framework to perform object detection as a language modeling task

𝐇𝐒𝐠𝐑π₯𝐒𝐠𝐑𝐭𝐬:
βœ…Obj. detection as a lang-modeling task
βœ…BBs/labels -> seq. of discrete token
βœ…Encoder-decoder (one token at a time)
βœ…Code under Apache License 2.0

More: https://bit.ly/3F49PX3
πŸ‘8🀯3πŸ”₯1😱1πŸŽ‰1🀩1
This media is not supported in your browser
VIEW IN TELEGRAM
🌹 Generalizable Neural Performer 🌹

πŸ‘‰General neural framework to synthesize free-viewpoint images of arbitrary human performers

𝐇𝐒𝐠𝐑π₯𝐒𝐠𝐑𝐭𝐬:
βœ…Free-viewpoint synthesis of humans
βœ…Implicit Geometric Body Embedding
βœ…Screen-Space Occlusion-Aware Blending
βœ…GeneBody: 4M frames, multi-view cams

More: https://cutt.ly/SGcnQzn
πŸ‘5πŸ”₯1🀯1
This media is not supported in your browser
VIEW IN TELEGRAM
🚌 Tire-defect inspection 🚌

πŸ‘‰Unsupervised defects in tires using neural networks

𝐇𝐒𝐠𝐑π₯𝐒𝐠𝐑𝐭𝐬:
βœ…Impurity, same material as tire
βœ…Impurity, with different material
βœ…Damage by temp/pressure
βœ…Crack or etched material

More: https://bit.ly/37GX1JT
❀5πŸ‘3🀩1
This media is not supported in your browser
VIEW IN TELEGRAM
πŸ§‹#4D Neural FieldsπŸ§‹

πŸ‘‰4D N.F. visual representations from monocular RGB-D 🀯

𝐇𝐒𝐠𝐑π₯𝐒𝐠𝐑𝐭𝐬:
βœ…4D scene completion (occlusions)
βœ…Scene completion in cluttered scenes
βœ…Novel #AI for contextual point clouds
βœ…Data, code, models under MIT license

More: https://cutt.ly/6GveKiJ
πŸ‘6🀯2πŸ”₯1πŸ₯°1
This media is not supported in your browser
VIEW IN TELEGRAM
πŸ‘”Largest dataset of human-object πŸ‘”

πŸ‘‰BEHAVE by Google: largest dataset of human-object interactions

𝐇𝐒𝐠𝐑π₯𝐒𝐠𝐑𝐭𝐬:
βœ…8 subjects, 20 objects, 5 envs.
βœ…321 clips with 4 Kinect RGB-D
βœ…Masks and segmented point clouds
βœ…3D SMPL & mesh registration
βœ…Textured scan reconstructions

More: https://bit.ly/3Lx6NNo
πŸ‘5πŸ‘4πŸ”₯2❀1😱1🀩1
This media is not supported in your browser
VIEW IN TELEGRAM
🦴ENARF-GAN Neural Articulations🦴

πŸ‘‰Unsupervised method for 3D geometry-aware representation of articulated objects

𝐇𝐒𝐠𝐑π₯𝐒𝐠𝐑𝐭𝐬:
βœ…Novel efficient neural representation
βœ…Tri-planes deformation fields for training
βœ…Novel GAN for articulated representations
βœ…Controllable 3D from real unlabeled pic

More: https://bit.ly/3xYqedN
🀯3πŸ‘2❀1πŸ”₯1πŸ₯°1
This media is not supported in your browser
VIEW IN TELEGRAM
πŸ–²οΈ HuMMan: 4D human dataset πŸ–²οΈ

πŸ‘‰HuMMan: 4D dataset with 1000 humans, 400k sequences & 60M frames 🀯

𝐇𝐒𝐠𝐑π₯𝐒𝐠𝐑𝐭𝐬:
βœ…RGB, pt-clouds, keypts, SMPL, texture
βœ…Mobile device in the sensor suite
βœ…500+ actions to cover movements

More: https://bit.ly/3vTRW8Z
πŸ₯°2😱2πŸ‘1🀯1
This media is not supported in your browser
VIEW IN TELEGRAM
πŸ”₯Neighborhood Attention Transformer πŸ”₯

πŸ‘‰A novel transformer for both image classification and downstream vision tasks

𝐇𝐒𝐠𝐑π₯𝐒𝐠𝐑𝐭𝐬:
βœ…Neighborhood Attention (NA)
βœ…Neighborhood Attention Transformer, NAT
βœ…Faster training/inference, good throughput
βœ…Checkpoints, train, #CUDA kernel available

More: https://bit.ly/3F5aVSo
🀯4πŸ‘3πŸ”₯1😱1
This media is not supported in your browser
VIEW IN TELEGRAM
πŸ”₯πŸ”₯FANs: Fully Attentional NetworksπŸ”₯πŸ”₯

πŸ‘‰#Nvidia unveils the fully attentional networks (FANs)

𝐇𝐒𝐠𝐑π₯𝐒𝐠𝐑𝐭𝐬:
βœ…Efficient fully attentional design
βœ…Semantic seg. & object detection
βœ…Model/source code soon available!

More: https://bit.ly/3vtpITs
πŸ”₯7🀯3πŸ‘2❀1
πŸ‘¨πŸΌβ€πŸŽ¨ Open-Source DALLΒ·E 2 is out πŸ‘¨πŸΌβ€πŸŽ¨

πŸ‘‰#Pytorch implementation of DALL-E 2, #OpenAI's latest text-to-image neural net.

𝐇𝐒𝐠𝐑π₯𝐒𝐠𝐑𝐭𝐬:
βœ…SOTA for text-to-image generation
βœ…Source code/model under MIT License
βœ…"Medieval painting of wifi not working"

More: https://bit.ly/3vzsff6
🀯14πŸ‘6😁1
This media is not supported in your browser
VIEW IN TELEGRAM
β›ΊViTPose: Transformer for Poseβ›Ί

πŸ‘‰ViTPose from ViTAE, ViT for human pose

𝐇𝐒𝐠𝐑π₯𝐒𝐠𝐑𝐭𝐬:
βœ…Plain/nonhierarchical ViT for pose
βœ…Deconv-layers after ViT for keypoints
βœ…Just the baseline is the new SOTA
βœ…Source code & models available soon!

More: https://bit.ly/3MJ0kz1
πŸ‘5🀯4πŸ”₯1πŸ₯°1