AI with Papers - Artificial Intelligence & Deep Learning
15.2K subscribers
135 photos
247 videos
13 files
1.31K links
All the AI with papers. Every day fresh updates about #DeepLearning, #MachineLearning, LLMs and #ComputerVision

Curated by Alessandro Ferrari | https://www.linkedin.com/in/visionarynet/

#artificialintelligence #machinelearning #ml #AI
Download Telegram
This media is not supported in your browser
VIEW IN TELEGRAM
πŸ”₯ Depth Any Camera (SOTA) πŸ”₯

πŸ‘‰DAC is a novel and powerful zero-shot metric depth estimation framework that extends a perspective-trained model to effectively handle cams with varying FoVs (including large fisheye & 360β—¦). Code announced (not available yet)πŸ’™

πŸ‘‰Review https://t.ly/1qz4F
πŸ‘‰Paper arxiv.org/pdf/2501.02464
πŸ‘‰Project yuliangguo.github.io/depth-any-camera/
πŸ‘‰Repo github.com/yuliangguo/depth_any_camera
πŸ‘12πŸ”₯5🀩4❀2😍1
This media is not supported in your browser
VIEW IN TELEGRAM
❀️‍πŸ”₯ Uncommon object in #3D ❀️‍πŸ”₯

πŸ‘‰#META releases uCO3D, a new object-centric dataset for 3D AI. The largest publicly-available collection of HD videos of objects with 3D annotations that ensures full-360β—¦ coverage. Code & data under CCA 4.0πŸ’™

πŸ‘‰Review https://t.ly/Z_tvA
πŸ‘‰Paper https://arxiv.org/pdf/2501.07574
πŸ‘‰Project https://uco3d.github.io/
πŸ‘‰Repo github.com/facebookresearch/uco3d
❀11⚑2😍2πŸ‘1πŸ‘1🀩1🍾1
This media is not supported in your browser
VIEW IN TELEGRAM
πŸ†Universal Detector-Free MatchπŸ†

πŸ‘‰MatchAnything: novel detector-free universal matcher across unseen real-world single/cross-modality domains. Same weights for everything. Code announced, to be released πŸ’™

πŸ‘‰Review https://t.ly/sx92L
πŸ‘‰Paper https://lnkd.in/dWwRwGyY
πŸ‘‰Project https://lnkd.in/dCwb2Yte
πŸ‘‰Repo https://lnkd.in/dnUXYzQ5
❀8🀯7πŸ”₯4πŸ‘3⚑1🀩1😍1🍾1
πŸ†˜ Help: Looking for Outstanding Speakers πŸ†˜

πŸ‘‰Who would you suggest as a speaker for your ideal conference on AI (CV, LLM, RAG, ML, HW Optimization, AI & Space, etc.)? Only β€œhardcore” technical talks, no commercial at all. Please comment here with name, topic and affiliation (es: Paul Gascoigne, Computer Vision & Football, Scotland Team).

⭐Guaranteed tickets & more for the suggestions that will become invited speakers ;)
❀5πŸ”₯4πŸ‘3
This media is not supported in your browser
VIEW IN TELEGRAM
πŸ§žβ€β™‚οΈOmni-RGPT: SOTA MLLM UnderstandingπŸ§žβ€β™‚οΈ

πŸ‘‰ #NVIDIA presents Omni-RGPT, MLLM for region-level comprehension for both images & videos. New SOTA on image/video-based commonsense reasoning.

πŸ‘‰Review https://t.ly/KHnQ7
πŸ‘‰Paper arxiv.org/pdf/2501.08326
πŸ‘‰Project miranheo.github.io/omni-rgpt/
πŸ‘‰Repo TBA soon
πŸ”₯10❀3🍾2⚑1πŸ‘1πŸ‘1
This media is not supported in your browser
VIEW IN TELEGRAM
πŸ”₯ GAGA: Group Any Gaussians πŸ”₯

πŸ‘‰GAGA is a framework that reconstructs and segments open-world 3D scenes by leveraging inconsistent 2D masks predicted by zero-shot segmentation models. Code available, recently updatedπŸ’™

πŸ‘‰Review https://t.ly/Nk_jT
πŸ‘‰Paper www.gaga.gallery/static/pdf/Gaga.pdf
πŸ‘‰Project www.gaga.gallery/
πŸ‘‰Repo github.com/weijielyu/Gaga
πŸ”₯11❀3πŸ‘2🀩1
This media is not supported in your browser
VIEW IN TELEGRAM
🎁Free Book: LLM Foundations🎁

πŸ‘‰A fully free book just released on arXiv to outline the basic concepts of #LLMs and related techniques with a focus on the foundational aspects.

βœ…Chapter 1: basics of pre-training
βœ…Chapter 2: gen-models & LLMs
βœ…Chapter 3: prompting methods
βœ…Chapter 4: alignment methods

πŸ‘‰If you have any background in ML, along with a certain understanding of stuff like Transformers, this book will be "smooth". However, even without this prior knowledge, it is still perfectly fine because the contents of each chapter are self-contained.

πŸ‘‰Review https://t.ly/9LGCa
πŸ‘‰Book https://lnkd.in/d3VkswZf
❀17πŸ”₯6πŸ‘3😍1
This media is not supported in your browser
VIEW IN TELEGRAM
πŸ„β€β™€οΈ GSTAR: Gaussian Surface Tracking πŸ„β€β™€οΈ

πŸ‘‰ETH Zurich unveils GSTAR, a novel framework for photo-realistic rendering, surface reconstruction, and 3D tracking for dynamic scenes while handling topology changes. Code announcedπŸ’™

πŸ‘‰Review https://t.ly/udpMq
πŸ‘‰Paper arxiv.org/pdf/2501.10283
πŸ‘‰Project chengwei-zheng.github.io/GSTAR/
πŸ‘‰Repo TBA
πŸ”₯8🀩3πŸ‘2😍2❀1🀯1
This media is not supported in your browser
VIEW IN TELEGRAM
🧽 Diffusion Video Inpainting 🧽

πŸ‘‰#Alibaba unveils a technical report about DiffuEraser, a video inpainting model based on stable diffusion, designed to fill masked regions with greater details and more coherent structures. Code & weights released under ApacheπŸ’™

πŸ‘‰Review https://t.ly/7rEll
πŸ‘‰Paper arxiv.org/pdf/2501.10018
πŸ‘‰Project lixiaowen-xw.github.io/DiffuEraser-page/
πŸ‘‰Repo github.com/lixiaowen-xw/DiffuEraser
πŸ”₯14❀3πŸ‘2⚑1πŸ‘1
This media is not supported in your browser
VIEW IN TELEGRAM
🌈 #Nvidia Foundation ZS-Stereo 🌈

πŸ‘‰Nvidia unveils FoundationStereo, a foundation model for stereo depth estimation with strong zero-shot generalization. In addition, a large-scale (1M stereo pairs) synthetic training dataset featuring large diversity and high photorealism. Code, model & dataset to be releasedπŸ’™

πŸ‘‰Review https://t.ly/rfBr5
πŸ‘‰Paper arxiv.org/pdf/2501.09898
πŸ‘‰Project nvlabs.github.io/FoundationStereo/
πŸ‘‰Repo github.com/NVlabs/FoundationStereo/tree/master
❀6πŸ”₯6🀩1
This media is not supported in your browser
VIEW IN TELEGRAM
πŸ”₯ [SOTA] Long-Video Depth Anything πŸ”₯

πŸ‘‰ByteDance unveils Video Depth Anything: HQ, consistent depth estimation in SUPER-long videos (over several minutes) without sacrificing efficiency. Based on Depth Anything V2 with a novel efficient spatial-temporal head. Repo available under Apache 2.0πŸ’™

πŸ‘‰Review https://t.ly/Q4ZZd
πŸ‘‰Paper arxiv.org/pdf/2501.12375
πŸ‘‰Project https://lnkd.in/dKNwJzbM
πŸ‘‰Repo https://lnkd.in/ddfwwpCj
πŸ”₯9🀯1
This media is not supported in your browser
VIEW IN TELEGRAM
🧡Time-Aware Pts-Tracking🧡

πŸ‘‰Chrono: feature backbone specifically designed for point tracking with built-in temporal awareness. Long-term temporal context, enabling precise prediction even without the refinements. Code announcedπŸ’™

πŸ‘‰Review https://t.ly/XAL7G
πŸ‘‰Paper arxiv.orgzpdf/2501.12218
πŸ‘‰Project cvlab-kaist.github.io/Chrono/
πŸ‘‰Repo github.com/cvlab-kaist/Chrono
❀5πŸ”₯5πŸ‘3😍1
This media is not supported in your browser
VIEW IN TELEGRAM
🎀EMO2: Audio-Driven Avatar🎀

πŸ‘‰Alibaba previews a novel audio-driven talking head method capable of simultaneously generating highly expressive facial expressions and hand gestures. Turn your audio ON. Stunning results but no code πŸ₯Ί

πŸ‘‰Review https://t.ly/x8slQ
πŸ‘‰Paper arxiv.org/pdf/2501.10687
πŸ‘‰Project humanaigc.github.io/emote-portrait-alive-2/
πŸ‘‰Repo πŸ₯Ί
🀯7❀6πŸ‘2🀩1
This media is not supported in your browser
VIEW IN TELEGRAM
🦠A-Life with Foundation Models🦠

πŸ‘‰A super team unveils ASAL, a new paradigm for Artificial Life research. A diverse range of ALife substrates including Boids, Particle Life, Game of Life, Lenia & Neural Cellular Automata. Code under Apache 2.0πŸ’™

πŸ‘‰Review https://t.ly/7SZ8A
πŸ‘‰Paper arxiv.org/pdf/2412.17799
πŸ‘‰Project https://pub.sakana.ai/asal/
πŸ‘‰Repo https://lnkd.in/dP5yxKtw
❀11⚑2🀩2
This media is not supported in your browser
VIEW IN TELEGRAM
πŸ”₯ The code of DynOMo is out πŸ”₯

πŸ‘‰DynOMo is a novel model able to track any point in a dynamic scene over time through 3D reconstruction from monocular video: 2D and 3D point tracking from unposed monocular camera input

πŸ‘‰Review https://t.ly/t5pCf
πŸ‘‰Paper https://lnkd.in/dwhzz4_t
πŸ‘‰Repo github.com/dvl-tum/DynOMo
πŸ‘‰Project https://lnkd.in/dMyku2HW
πŸ”₯7❀5😍5πŸ‘2🀩2🍾2⚑1
This media is not supported in your browser
VIEW IN TELEGRAM
πŸͺ†SOTA Points SegmentationπŸͺ†

πŸ‘‰VGG Oxford unveils a novel loss to segment objects in videos based on their motion and NO other forms of supervision! Training the net using long-term point trajectories as a supervisory signal to complement optical flow. New SOTA!

πŸ‘‰Review https://t.ly/8Bsbt
πŸ‘‰Paper https://arxiv.org/pdf/2501.12392
πŸ‘‰Code https://github.com/karazijal/lrtl
πŸ‘‰Project www.robots.ox.ac.uk/~vgg/research/lrtl/
πŸ”₯3❀2🀩1
This media is not supported in your browser
VIEW IN TELEGRAM
🎨MatAnyone: Human Matting🎨

πŸ‘‰MatAnyone is a novel approach for human video matting that supports the target assignment. Stable tracking in long videos even with complex/ambiguous BGs. Code & πŸ€—-Demo announcedπŸ’™

πŸ‘‰Review https://t.ly/NVXsT
πŸ‘‰Paper arxiv.org/pdf/2501.14677
πŸ‘‰Project pq-yang.github.io/projects/MatAnyone
πŸ‘‰Repo TBA
❀15πŸ‘2🀩2πŸ‘1πŸ”₯1
This media is not supported in your browser
VIEW IN TELEGRAM
πŸ¦•[SOTA] Visual Grounding VOSπŸ¦•

πŸ‘‰ReferDINO is the first end-to-end approach for adapting foundational visual grounding models to RVOS. Code & models to be released soonπŸ’™

πŸ‘‰Review https://t.ly/SDFy9
πŸ‘‰Paper arxiv.org/pdf/2501.14607
πŸ‘‰Project isee-laboratory.github.io/ReferDINO/
πŸ‘‰Repo github.com/iSEE-Laboratory/ReferDINO
🀯4❀1πŸ”₯1🀩1
This media is not supported in your browser
VIEW IN TELEGRAM
β˜€οΈ Relightable Full-Body Avatars β˜€οΈ

πŸ‘‰#Meta unveils the first approach ever to jointly model the relightable appearance of the body, face, and hands of drivable avatars.

πŸ‘‰Review https://t.ly/kx9gf
πŸ‘‰Paper arxiv.org/pdf/2501.14726
πŸ‘‰Project neuralbodies.github.io/RFGCA
❀3πŸ‘3πŸ”₯3⚑1🀯1😒1🀩1
This media is not supported in your browser
VIEW IN TELEGRAM
πŸŒ… Generative Human Mesh Recovery πŸŒ…

πŸ‘‰GenHMR is a novel generative framework that reformulates monocular HMR as an image-conditioned generative task, explicitly modeling and mitigating uncertainties in 2D-to-3D mapping process. Impressive results but no code announced πŸ₯Ί

πŸ‘‰Review https://t.ly/Rrzpj
πŸ‘‰Paper https://arxiv.org/pdf/2412.14444
πŸ‘‰Project m-usamasaleem.github.io/publication/GenHMR/GenHMR.html
πŸ”₯6πŸ‘2❀1🀯1🍾1
Social feed of everyone is broken because of unnecessary/not required opinions about DeepSeek. Your wish:
Anonymous Poll
37%
πŸ›‘ STOP posting about!
63%
🟩 Keep posting. we want more!
πŸ‘1