AI with Papers - Artificial Intelligence & Deep Learning
15.1K subscribers
135 photos
247 videos
13 files
1.31K links
All the AI with papers. Every day fresh updates about #DeepLearning, #MachineLearning, LLMs and #ComputerVision

Curated by Alessandro Ferrari | https://www.linkedin.com/in/visionarynet/

#artificialintelligence #machinelearning #ml #AI
Download Telegram
This media is not supported in your browser
VIEW IN TELEGRAM
๐ŸŒˆ #Nvidia Foundation ZS-Stereo ๐ŸŒˆ

๐Ÿ‘‰Nvidia unveils FoundationStereo, a foundation model for stereo depth estimation with strong zero-shot generalization. In addition, a large-scale (1M stereo pairs) synthetic training dataset featuring large diversity and high photorealism. Code, model & dataset to be released๐Ÿ’™

๐Ÿ‘‰Review https://t.ly/rfBr5
๐Ÿ‘‰Paper arxiv.org/pdf/2501.09898
๐Ÿ‘‰Project nvlabs.github.io/FoundationStereo/
๐Ÿ‘‰Repo github.com/NVlabs/FoundationStereo/tree/master
โค6๐Ÿ”ฅ6๐Ÿคฉ1
This media is not supported in your browser
VIEW IN TELEGRAM
๐Ÿ”ฅ [SOTA] Long-Video Depth Anything ๐Ÿ”ฅ

๐Ÿ‘‰ByteDance unveils Video Depth Anything: HQ, consistent depth estimation in SUPER-long videos (over several minutes) without sacrificing efficiency. Based on Depth Anything V2 with a novel efficient spatial-temporal head. Repo available under Apache 2.0๐Ÿ’™

๐Ÿ‘‰Review https://t.ly/Q4ZZd
๐Ÿ‘‰Paper arxiv.org/pdf/2501.12375
๐Ÿ‘‰Project https://lnkd.in/dKNwJzbM
๐Ÿ‘‰Repo https://lnkd.in/ddfwwpCj
๐Ÿ”ฅ9๐Ÿคฏ1
This media is not supported in your browser
VIEW IN TELEGRAM
๐ŸงตTime-Aware Pts-Tracking๐Ÿงต

๐Ÿ‘‰Chrono: feature backbone specifically designed for point tracking with built-in temporal awareness. Long-term temporal context, enabling precise prediction even without the refinements. Code announced๐Ÿ’™

๐Ÿ‘‰Review https://t.ly/XAL7G
๐Ÿ‘‰Paper arxiv.orgzpdf/2501.12218
๐Ÿ‘‰Project cvlab-kaist.github.io/Chrono/
๐Ÿ‘‰Repo github.com/cvlab-kaist/Chrono
โค5๐Ÿ”ฅ5๐Ÿ‘3๐Ÿ˜1
This media is not supported in your browser
VIEW IN TELEGRAM
๐ŸŽคEMO2: Audio-Driven Avatar๐ŸŽค

๐Ÿ‘‰Alibaba previews a novel audio-driven talking head method capable of simultaneously generating highly expressive facial expressions and hand gestures. Turn your audio ON. Stunning results but no code ๐Ÿฅบ

๐Ÿ‘‰Review https://t.ly/x8slQ
๐Ÿ‘‰Paper arxiv.org/pdf/2501.10687
๐Ÿ‘‰Project humanaigc.github.io/emote-portrait-alive-2/
๐Ÿ‘‰Repo ๐Ÿฅบ
๐Ÿคฏ7โค6๐Ÿ‘2๐Ÿคฉ1
This media is not supported in your browser
VIEW IN TELEGRAM
๐Ÿฆ A-Life with Foundation Models๐Ÿฆ 

๐Ÿ‘‰A super team unveils ASAL, a new paradigm for Artificial Life research. A diverse range of ALife substrates including Boids, Particle Life, Game of Life, Lenia & Neural Cellular Automata. Code under Apache 2.0๐Ÿ’™

๐Ÿ‘‰Review https://t.ly/7SZ8A
๐Ÿ‘‰Paper arxiv.org/pdf/2412.17799
๐Ÿ‘‰Project https://pub.sakana.ai/asal/
๐Ÿ‘‰Repo https://lnkd.in/dP5yxKtw
โค11โšก2๐Ÿคฉ2
This media is not supported in your browser
VIEW IN TELEGRAM
๐Ÿ”ฅ The code of DynOMo is out ๐Ÿ”ฅ

๐Ÿ‘‰DynOMo is a novel model able to track any point in a dynamic scene over time through 3D reconstruction from monocular video: 2D and 3D point tracking from unposed monocular camera input

๐Ÿ‘‰Review https://t.ly/t5pCf
๐Ÿ‘‰Paper https://lnkd.in/dwhzz4_t
๐Ÿ‘‰Repo github.com/dvl-tum/DynOMo
๐Ÿ‘‰Project https://lnkd.in/dMyku2HW
๐Ÿ”ฅ7โค5๐Ÿ˜5๐Ÿ‘2๐Ÿคฉ2๐Ÿพ2โšก1
This media is not supported in your browser
VIEW IN TELEGRAM
๐Ÿช†SOTA Points Segmentation๐Ÿช†

๐Ÿ‘‰VGG Oxford unveils a novel loss to segment objects in videos based on their motion and NO other forms of supervision! Training the net using long-term point trajectories as a supervisory signal to complement optical flow. New SOTA!

๐Ÿ‘‰Review https://t.ly/8Bsbt
๐Ÿ‘‰Paper https://arxiv.org/pdf/2501.12392
๐Ÿ‘‰Code https://github.com/karazijal/lrtl
๐Ÿ‘‰Project www.robots.ox.ac.uk/~vgg/research/lrtl/
๐Ÿ”ฅ3โค2๐Ÿคฉ1
This media is not supported in your browser
VIEW IN TELEGRAM
๐ŸŽจMatAnyone: Human Matting๐ŸŽจ

๐Ÿ‘‰MatAnyone is a novel approach for human video matting that supports the target assignment. Stable tracking in long videos even with complex/ambiguous BGs. Code & ๐Ÿค—-Demo announced๐Ÿ’™

๐Ÿ‘‰Review https://t.ly/NVXsT
๐Ÿ‘‰Paper arxiv.org/pdf/2501.14677
๐Ÿ‘‰Project pq-yang.github.io/projects/MatAnyone
๐Ÿ‘‰Repo TBA
โค15๐Ÿ‘2๐Ÿคฉ2๐Ÿ‘1๐Ÿ”ฅ1
This media is not supported in your browser
VIEW IN TELEGRAM
๐Ÿฆ•[SOTA] Visual Grounding VOS๐Ÿฆ•

๐Ÿ‘‰ReferDINO is the first end-to-end approach for adapting foundational visual grounding models to RVOS. Code & models to be released soon๐Ÿ’™

๐Ÿ‘‰Review https://t.ly/SDFy9
๐Ÿ‘‰Paper arxiv.org/pdf/2501.14607
๐Ÿ‘‰Project isee-laboratory.github.io/ReferDINO/
๐Ÿ‘‰Repo github.com/iSEE-Laboratory/ReferDINO
๐Ÿคฏ4โค1๐Ÿ”ฅ1๐Ÿคฉ1
This media is not supported in your browser
VIEW IN TELEGRAM
โ˜€๏ธ Relightable Full-Body Avatars โ˜€๏ธ

๐Ÿ‘‰#Meta unveils the first approach ever to jointly model the relightable appearance of the body, face, and hands of drivable avatars.

๐Ÿ‘‰Review https://t.ly/kx9gf
๐Ÿ‘‰Paper arxiv.org/pdf/2501.14726
๐Ÿ‘‰Project neuralbodies.github.io/RFGCA
โค3๐Ÿ‘3๐Ÿ”ฅ3โšก1๐Ÿคฏ1๐Ÿ˜ข1๐Ÿคฉ1
This media is not supported in your browser
VIEW IN TELEGRAM
๐ŸŒ… Generative Human Mesh Recovery ๐ŸŒ…

๐Ÿ‘‰GenHMR is a novel generative framework that reformulates monocular HMR as an image-conditioned generative task, explicitly modeling and mitigating uncertainties in 2D-to-3D mapping process. Impressive results but no code announced ๐Ÿฅบ

๐Ÿ‘‰Review https://t.ly/Rrzpj
๐Ÿ‘‰Paper https://arxiv.org/pdf/2412.14444
๐Ÿ‘‰Project m-usamasaleem.github.io/publication/GenHMR/GenHMR.html
๐Ÿ”ฅ6๐Ÿ‘2โค1๐Ÿคฏ1๐Ÿพ1
Social feed of everyone is broken because of unnecessary/not required opinions about DeepSeek. Your wish:
Anonymous Poll
37%
๐Ÿ›‘ STOP posting about!
63%
๐ŸŸฉ Keep posting. we want more!
๐Ÿ‘1
๐Ÿ’ŽAI-driven Docs Conversion๐Ÿ’Ž

๐Ÿ‘‰Docling by IBM, is the ALL-in-ONE, open source solution for documents; parsing several types of popular formats into a unified, richly structured representation. Powered by SOTA models for layout (DocLayNet) and table structure (TableFormer), it runs efficiently on low-cost hardware. Code under MIT๐Ÿ’™

๐Ÿ‘‰Review https://t.ly/nSCfT
๐Ÿ‘‰Paper https://lnkd.in/dc5Kpc2F
๐Ÿ‘‰Repo https://lnkd.in/d9gvw9bt
โค18๐Ÿ‘8๐Ÿ”ฅ1๐Ÿพ1
This media is not supported in your browser
VIEW IN TELEGRAM
๐Ÿˆฏ SOTA 0-Shot Multi-View ๐Ÿˆฏ

๐Ÿ‘‰MVGD by #TOYOTA is the SOTA method that generates images and scale-consistent depth maps from novel viewpoints given an arbitrary number of posed input views. A novel diffusion-based architecture capable of direct pixel-level generation. Code announced ๐Ÿ’™

๐Ÿ‘‰Review https://t.ly/_ecKl
๐Ÿ‘‰Paper arxiv.org/pdf/2501.18804
๐Ÿ‘‰Project mvgd.github.io/
๐Ÿ‘‰Repo TBA
๐Ÿ”ฅ8โค1๐Ÿ˜1
This media is not supported in your browser
VIEW IN TELEGRAM
๐Ÿ™MambaGlue: SOTA feats. matching๐Ÿ™

๐Ÿ‘‰MambaGlue is a hybrid neural network combining the Mamba and the Transformer architectures to match local features. Source Code announced, to be released๐Ÿ’™

๐Ÿ‘‰Review https://shorturl.at/LxDG1
๐Ÿ‘‰Paper arxiv.org/pdf/2502.00462
๐Ÿ‘‰Repo https://lnkd.in/dAujfGZQ
๐Ÿคฉ9โค3๐Ÿ”ฅ2๐Ÿ‘2๐Ÿ‘1๐Ÿพ1
This media is not supported in your browser
VIEW IN TELEGRAM
๐Ÿ›ธReal-Time Differentiable Tracing๐Ÿ›ธ

๐Ÿ‘‰ Radiant Foam is a novel scene representation by leveraging the decades-old efficient volumetric mesh ray tracing algorithm (largely overlooked in recent research). Performing like Gaussian Splatting, without the constraints of rasterization. Code announced๐Ÿ’™

๐Ÿ‘‰Review https://shorturl.at/26U06
๐Ÿ‘‰Paper https://arxiv.org/pdf/2502.01157
๐Ÿ‘‰Project https://radfoam.github.io/
๐Ÿ‘‰Repo https://github.com/theialab/radfoam
๐Ÿ”ฅ7โค1๐Ÿ˜1
This media is not supported in your browser
VIEW IN TELEGRAM
๐Ÿ”ฅ VideoJAM: #META's Video-Model (SOTA) ๐Ÿ”ฅ

๐Ÿ‘‰#META's VideoJAM: the new SOTA (by large margin) in motion coherence for video generation, much better than SORA! A strong motion prior into any video-gen model. Impressive results, no code announced๐Ÿฅฒ

๐Ÿ‘‰Review https://shorturl.at/id7Bt
๐Ÿ‘‰Paper https://arxiv.org/pdf/2502.02492
๐Ÿ‘‰Project https://hila-chefer.github.io/videojam-paper.github.io/
๐Ÿ”ฅ9โค4๐Ÿ‘1๐Ÿ‘1๐Ÿคฉ1
This media is not supported in your browser
VIEW IN TELEGRAM
๐Ÿ‘—3D Dynamic Garments๐Ÿ‘—

๐Ÿ‘‰UCLA introduces Dress-1-to-3, a novel pipeline that reconstructs physics-plausible, simulation-ready separated garments with sewing patterns and humans from an in-the-wild image.

๐Ÿ‘‰Review https://t.ly/qciHV
๐Ÿ‘‰Paper arxiv.org/pdf/2502.03449
๐Ÿ‘‰Project dress-1-to-3.github.io
๐Ÿ”ฅ8โค3๐Ÿ‘3๐Ÿ‘2๐Ÿคฉ1๐Ÿ˜1
This media is not supported in your browser
VIEW IN TELEGRAM
๐Ÿค– META Human-Robot ๐Ÿค–

๐Ÿ‘‰#META PARTNR: novel benchmark for Planning And Reasoning Tasks in humaN-Robot collaboration. The largest benchmark of its kind: 100,000+ natural language tasks, spanning 60 houses and 5,819 unique objects. Code & Data (๐Ÿค—) under MIT๐Ÿ’™

๐Ÿ‘‰Review https://t.ly/zcN0K
๐Ÿ‘‰Paper arxiv.org/pdf/2411.00081
๐Ÿ‘‰Repo github.com/facebookresearch/partnr-planner
๐Ÿค—Data huggingface.co/datasets/ai-habitat/partnr_episodes
๐Ÿ”ฅ9๐Ÿคฉ2โค1๐Ÿ‘1๐Ÿ˜1
This media is not supported in your browser
VIEW IN TELEGRAM
๐Ÿ’ƒHumanDiT Long-form Human๐Ÿ’ƒ

๐Ÿ‘‰HumanDiT is a novel pose-guided Diffusion trained on a large and wild dataset w/ 14,000 hours of HQ video to produce HD videos with fine-grained bodies. Stunning results but no code announced๐Ÿฅฒ

๐Ÿ‘‰Review https://t.ly/7rTRr
๐Ÿ‘‰Paper https://arxiv.org/pdf/2502.04847
๐Ÿ‘‰Project https://agnjason.github.io/HumanDiT-page/
โค6๐Ÿ”ฅ3๐Ÿ‘2๐Ÿ‘1๐Ÿคฏ1