AI with Papers - Artificial Intelligence & Deep Learning
15.2K subscribers
135 photos
247 videos
14 files
1.31K links
All the AI with papers. Every day fresh updates about #DeepLearning, #MachineLearning, LLMs and #ComputerVision

Curated by Alessandro Ferrari | https://www.linkedin.com/in/visionarynet/

#artificialintelligence #machinelearning #ml #AI
Download Telegram
This media is not supported in your browser
VIEW IN TELEGRAM
πŸ’„Interactive Drag-based EditingπŸ’„

πŸ‘‰CSE unveils InstantDrag: novel pipeline designed to enhance editing interactivity and speed, taking only an image and a drag instruction as input. Source Code announced, comingπŸ’™

πŸ‘‰Review https://t.ly/hy6SL
πŸ‘‰Paper arxiv.org/pdf/2409.08857
πŸ‘‰Project joonghyuk.com/instantdrag-web/
πŸ‘‰Code github.com/alex4727/InstantDrag
πŸ”₯13πŸ‘3😍1
This media is not supported in your browser
VIEW IN TELEGRAM
🌭Hand-Object interaction Pretraining🌭

πŸ‘‰Berkeley unveils HOP, a novel approach to learn general robot manipulation priors from 3D hand-object interaction trajectories.

πŸ‘‰Review https://t.ly/FLqvJ
πŸ‘‰Paper https://arxiv.org/pdf/2409.08273
πŸ‘‰Project https://hgaurav2k.github.io/hop/
πŸ₯°3❀1πŸ‘1πŸ”₯1
This media is not supported in your browser
VIEW IN TELEGRAM
🧸Motion Instruction Fine-Tuning🧸

πŸ‘‰MotIF is a novel method that fine-tunes pre-trained VLMs to equip the capability to distinguish nuanced robotic motions with different shapes and semantic groundings. A work by MIT, Stanford, and CMU. Source Code announced, comingπŸ’™

πŸ‘‰Review https://t.ly/iJ2UY
πŸ‘‰Paper https://arxiv.org/pdf/2409.10683
πŸ‘‰Project https://motif-1k.github.io/
πŸ‘‰Code coming
πŸ‘1πŸ”₯1🀯1🀩1
This media is not supported in your browser
VIEW IN TELEGRAM
⚽ SoccerNet 2024 Results ⚽

πŸ‘‰SoccerNet is the annual video understanding challenge for football. These challenges aim to advance research across multiple themes in football. The 2024 results are out!

πŸ‘‰Review https://t.ly/DUPgx
πŸ‘‰Paper arxiv.org/pdf/2409.10587
πŸ‘‰Repo github.com/SoccerNet
πŸ‘‰Project www.soccer-net.org/
πŸ”₯12πŸ‘6🀯1
This media is not supported in your browser
VIEW IN TELEGRAM
🌏 JoyHallo: Mandarin Digital Human 🌏

πŸ‘‰JD Health faced the challenges of audio-driven video generation in Mandarin, a task complicated by the language’s intricate lip movements and the scarcity of HQ datasets. Impressive results (-> audio ON). Code Models availableπŸ’™

πŸ‘‰Review https://t.ly/5NGDh
πŸ‘‰Paper arxiv.org/pdf/2409.13268
πŸ‘‰Project jdh-algo.github.io/JoyHallo/
πŸ‘‰Code github.com/jdh-algo/JoyHallo
πŸ”₯9πŸ‘1🀯1
This media is not supported in your browser
VIEW IN TELEGRAM
🎒 Robo-quadruped Parkour🎒

πŸ‘‰LAAS-CNRS unveils a novel RL approach to perform agile skills that are reminiscent of parkour, such as walking, climbing high steps, leaping over gaps, and crawling under obstacles. Data and Code availableπŸ’™

πŸ‘‰Review https://t.ly/-6VRm
πŸ‘‰Paper arxiv.org/pdf/2409.13678
πŸ‘‰Project gepetto.github.io/SoloParkour/
πŸ‘‰Code github.com/Gepetto/SoloParkour
πŸ”₯5πŸ‘2πŸ‘1🀯1
This media is not supported in your browser
VIEW IN TELEGRAM
🩰 Dressed Humans in the wild 🩰

πŸ‘‰ETH (+ #Microsoft ) ReLoo: novel 3D-HQ reconstruction of humans dressed in loose garments from mono in-the-wild clips. No prior assumptions about the garments. Source Code announced, coming πŸ’™

πŸ‘‰Review https://t.ly/evgmN
πŸ‘‰Paper arxiv.org/pdf/2409.15269
πŸ‘‰Project moygcc.github.io/ReLoo/
πŸ‘‰Code github.com/eth-ait/ReLoo
🀯9❀2πŸ‘1πŸ”₯1
This media is not supported in your browser
VIEW IN TELEGRAM
🌾 New SOTA Edge Detection 🌾

πŸ‘‰CUP (+ ESPOCH) unveils the new SOTA for Edge Detection (NBED); superior performance consistently across multiple benchmarks, even compared with huge computational cost and complex training models. Source Code releasedπŸ’™

πŸ‘‰Review https://t.ly/zUMcS
πŸ‘‰Paper arxiv.org/pdf/2409.14976
πŸ‘‰Code github.com/Li-yachuan/NBED
πŸ”₯11πŸ‘5πŸ‘1
This media is not supported in your browser
VIEW IN TELEGRAM
πŸ‘©β€πŸ¦° SOTA Gaussian Haircut πŸ‘©β€πŸ¦°

πŸ‘‰ETH et. al unveils Gaussian Haircut, the new SOTA in hair reconstruction via dual representation (classic + 3D Gaussian). Code and Model announcedπŸ’™

πŸ‘‰Review https://t.ly/aiOjq
πŸ‘‰Paper arxiv.org/pdf/2409.14778
πŸ‘‰Project https://lnkd.in/dFRm2ycb
πŸ‘‰Repo https://lnkd.in/d5NWNkb5
πŸ”₯16πŸ‘2❀1🀯1
This media is not supported in your browser
VIEW IN TELEGRAM
πŸ‡SPARK: Real-time Face CaptureπŸ‡

πŸ‘‰Technicolor Group unveils SPARK, a novel high-precision 3D face capture via collection of unconstrained videos of a subject as prior information. New SOTA able to handle unseen pose, expression and lighting. Impressive results. Code & Model announcedπŸ’™

πŸ‘‰Review https://t.ly/rZOgp
πŸ‘‰Paper arxiv.org/pdf/2409.07984
πŸ‘‰Project kelianb.github.io/SPARK/
πŸ‘‰Repo github.com/KelianB/SPARK/
πŸ”₯10❀2πŸ‘1πŸ’©1
This media is not supported in your browser
VIEW IN TELEGRAM
🦴 One-Image Object Detection 🦴

πŸ‘‰Delft University (+Hensoldt Optronics) introduces OSSA, a novel unsupervised domain adaptation method for object detection that utilizes a single, unlabeled target image to approximate the target domain style. Code releasedπŸ’™

πŸ‘‰Review https://t.ly/-li2G
πŸ‘‰Paper arxiv.org/pdf/2410.00900
πŸ‘‰Code github.com/RobinGerster7/OSSA
πŸ”₯19πŸ‘2⚑1πŸ‘1πŸ₯°1
This media is not supported in your browser
VIEW IN TELEGRAM
πŸ›³οΈ EVER Ellipsoid Rendering πŸ›³οΈ

πŸ‘‰UCSD & Google present EVER, a novel method for real-time differentiable emission-only volume rendering. Unlike 3DGS it does not suffer from popping artifacts and view dependent density, achieving ∼30 FPS at 720p on #NVIDIA RTX4090.

πŸ‘‰Review https://t.ly/zAfGU
πŸ‘‰Paper arxiv.org/pdf/2410.01804
πŸ‘‰Project half-potato.gitlab.io/posts/ever/
πŸ”₯13❀2πŸ‘2πŸ‘1🀯1😱1🍾1
πŸ”₯ "Deep Gen-AI" Full Course πŸ”₯

πŸ‘‰A fresh course from Stanford about the probabilistic foundations and algorithms for deep generative models. A novel overview about the evolution of the genAI in #computervision, language and more...

πŸ‘‰Review https://t.ly/ylBxq
πŸ‘‰Course https://lnkd.in/dMKH9gNe
πŸ‘‰Lectures https://lnkd.in/d_uwDvT6
❀21πŸ”₯7πŸ‘2πŸ‘1πŸ₯°1🀩1
This media is not supported in your browser
VIEW IN TELEGRAM
🐏 EFM3D: 3D Ego-Foundation 🐏

πŸ‘‰#META presents EFM3D, the first benchmark for 3D object detection and surface regression on HQ annotated egocentric data of Project Aria. Datasets & Code releasedπŸ’™

πŸ‘‰Review https://t.ly/cDJv6
πŸ‘‰Paper arxiv.org/pdf/2406.10224
πŸ‘‰Project www.projectaria.com/datasets/aeo/
πŸ‘‰Repo github.com/facebookresearch/efm3d
πŸ”₯9❀2πŸ‘2⚑1πŸ‘1😍1
This media is not supported in your browser
VIEW IN TELEGRAM
πŸ₯¦Gaussian Splatting VTONπŸ₯¦

πŸ‘‰GS-VTON is a novel image-prompted 3D-VTON which, by leveraging 3DGS as the 3D representation, enables the transfer of pre-trained knowledge from 2D VTON models to 3D while improving cross-view consistency. Code announcedπŸ’™

πŸ‘‰Review https://t.ly/sTPbW
πŸ‘‰Paper arxiv.org/pdf/2410.05259
πŸ‘‰Project yukangcao.github.io/GS-VTON/
πŸ‘‰Repo github.com/yukangcao/GS-VTON
πŸ”₯14❀3πŸ‘1πŸ‘1😍1
This media is not supported in your browser
VIEW IN TELEGRAM
πŸ’‘Diffusion Models RelightingπŸ’‘

πŸ‘‰#Netflix unveils DifFRelight, a novel free-viewpoint facial relighting via diffusion model. Precise lighting control, high-fidelity relit facial images from flat-lit inputs.

πŸ‘‰Review https://t.ly/fliXU
πŸ‘‰Paper arxiv.org/pdf/2410.08188
πŸ‘‰Project www.eyelinestudios.com/research/diffrelight.html
πŸ”₯17❀7⚑2πŸ‘2😍2πŸ‘1
This media is not supported in your browser
VIEW IN TELEGRAM
πŸ₯ŽPOKEFLEX: Soft Object DatasetπŸ₯Ž

πŸ‘‰PokeFlex from ETH is a dataset that includes 3D textured meshes, point clouds, RGB & depth maps of deformable objects. Pretrained models & dataset announcedπŸ’™

πŸ‘‰Review https://t.ly/GXggP
πŸ‘‰Paper arxiv.org/pdf/2410.07688
πŸ‘‰Project https://lnkd.in/duv-jS7a
πŸ‘‰Repo
πŸ‘7πŸ”₯2πŸ₯°1πŸ‘1😱1
This media is not supported in your browser
VIEW IN TELEGRAM
πŸ”₯ DEPTH ANY VIDEO is out! πŸ”₯

πŸ‘‰DAV is a novel foundation model for image/video depth estimation.The new SOTA for accuracy & consistency, up to 150 FPS!

πŸ‘‰Review https://t.ly/CjSz2
πŸ‘‰Paper arxiv.org/pdf/2410.10815
πŸ‘‰Project depthanyvideo.github.io/
πŸ‘‰Code github.com/Nightmare-n/DepthAnyVideo
πŸ”₯14🀯3❀1πŸ‘1
This media is not supported in your browser
VIEW IN TELEGRAM
πŸͺžRobo-Emulation via Video ImitationπŸͺž

πŸ‘‰OKAMI (UT & #Nvidia) is a novel foundation method that generates a manipulation plan from a single RGB-D video and derives a policy for execution.

πŸ‘‰Review https://t.ly/_N29-
πŸ‘‰Paper arxiv.org/pdf/2410.11792
πŸ‘‰Project https://lnkd.in/d6bHF_-s
πŸ‘4🀯2πŸ”₯1
This media is not supported in your browser
VIEW IN TELEGRAM
πŸ”₯ CoTracker3 by #META is out! πŸ”₯

πŸ‘‰#Meta (+VGG Oxford) unveils CoTracker3, a new tracker that outperforms the previous SoTA by a large margin using only the 0.1% of the training data 🀯🀯🀯

πŸ‘‰Review https://t.ly/TcRIv
πŸ‘‰Paper arxiv.org/pdf/2410.11831
πŸ‘‰Project cotracker3.github.io/
πŸ‘‰Code github.com/facebookresearch/co-tracker
❀14πŸ”₯3🀯3🍾2πŸ‘1😱1😍1
This media is not supported in your browser
VIEW IN TELEGRAM
🦠 Neural Metamorphosis 🦠

πŸ‘‰NU Singapore unveils NeuMeta to transform neural nets by allowing a single model to adapt on the fly to different sizes, generating the right weights when needed.

πŸ‘‰Review https://t.ly/DJab3
πŸ‘‰Paper arxiv.org/pdf/2410.11878
πŸ‘‰Project adamdad.github.io/neumeta
πŸ‘‰Code github.com/Adamdad/neumeta
❀7πŸ”₯3🀯3😱2⚑1πŸ‘1