AI with Papers - Artificial Intelligence & Deep Learning

🍄 Open-MLLMs Self-Driving 🍄

👉OpenEMMA: a novel open-source e2e framework based on MLLMs (via Chain-of-Thought reasoning). Effectiveness, generalizability, and robustness across a variety of challenging driving scenarios. Code released under Apache 2.0💙

👉Review https://t.ly/waLZI
👉Paper https://arxiv.org/pdf/2412.15208
👉Code https://github.com/taco-group/OpenEMMA

❤12👍5🔥5👏1😍1

9.53K views09:11

AI with Papers - Artificial Intelligence & Deep Learning

0:04

This media is not supported in your browser

VIEW IN TELEGRAM

🔄️ Orient Anything in 3D 🔄️
️
👉Orient Anything is a novel robust image-based object orientation estimation model. By training on 2M rendered labeled images, it achieves strong zero-shot generalization in the wild. Code released💙

👉Review https://t.ly/ro5ep
👉Paper arxiv.org/pdf/2412.18605
👉Project orient-anything.github.io/
👉Code https://lnkd.in/d_3k6Nxz

👍9❤7🔥3⚡1🤩1

8.86K views10:20

AI with Papers - Artificial Intelligence & Deep Learning

This media is not supported in your browser

VIEW IN TELEGRAM

⭐TOP 10 Papers you loved - 2024⭐

👉Here the list of my posts you liked the most in 2024, thank you all 💙

𝐏𝐚𝐩𝐞𝐫𝐬:
⭐"Look Ma, no markers"
⭐T-Rex 2 Detector
⭐Models at Any Resolution

👉The full list with links: https://t.ly/GvQVy

❤12🔥4👍1🤩1😍1

8.71K viewsedited 07:57

AI with Papers - Artificial Intelligence & Deep Learning

This media is not supported in your browser

VIEW IN TELEGRAM

🌳 HD Video Object Insertion 🌳

👉VideoAnydoor is a novel zero-shot video object insertion #AI with high-fidelity detail preservation and precise motion control. All-in-one: video VTON, face swapping, logo insertion, multi-region editing, etc.

👉Review https://t.ly/hyvRq
👉Paper arxiv.org/pdf/2501.01427
👉Project videoanydoor.github.io/
👉Repo TBA

🔥8❤2💩2👍1🤩1😍1

7.95K viewsedited 10:13

AI with Papers - Artificial Intelligence & Deep Learning

This media is not supported in your browser

VIEW IN TELEGRAM

⭐ Poll Alert!! ⭐

[EDIT] see below

❤3👍2🔥1

7.48K viewsedited 12:09

AI with Papers - Artificial Intelligence & Deep Learning

What is your favorite source for the AI updates?

Final Results

32%

Instagram

52%

Others ( comment here: https://t.ly/chQWq )

👍11🔥2❤1😍1

573 voters8.22K views12:52

AI with Papers - Artificial Intelligence & Deep Learning

AI with Papers - Artificial Intelligence & Deep Learning pinned «What is your favorite source for the AI updates?»

13:46

AI with Papers - Artificial Intelligence & Deep Learning

This media is not supported in your browser

VIEW IN TELEGRAM

🥮 SOTA probabilistic tracking🥮

👉ProTracker is a novel framework for robust and accurate long-term dense tracking of arbitrary points in videos. Code released under CC Attribution-NonCommercial💙

👉Review https://t.ly/YY_PH
👉Paper https://arxiv.org/pdf/2501.03220
👉Project michaelszj.github.io/protracker/
👉Code github.com/Michaelszj/pro-tracker

❤6🔥5👍2🤩2👏1

7.25K viewsedited 09:12

AI with Papers - Artificial Intelligence & Deep Learning

0:03

This media is not supported in your browser

VIEW IN TELEGRAM

🧤World-Space Ego 3D Hands🧤

👉The Imperial College unveils HaWoR, a novel world-space 3D hand motion estimation for egocentric videos. The new SOTA on both cam pose estimation & hand motion reconstruction. Code under Attribution-NC-ND 4.0 Int.💙

👉Review https://t.ly/ozJn7
👉Paper arxiv.org/pdf/2501.02973
👉Project hawor-project.github.io/
👉Code github.com/ThunderVVV/HaWoR

🔥4😢1🤩1

7.2K viewsedited 09:18

AI with Papers - Artificial Intelligence & Deep Learning

🔥 "Nuclear" AI vs. Hyper-Cheap Inference 🔥

⭐ What do you expect in 2025 after the #Nvidia announcements at CES 2025? Free to comment :)

Anonymous Poll

24%

🤲Portabile Training Workstation

34%

⚛️Nuclear energy for AI training

33%

🖲️Cheaper Only-inference devices

💰Cloud-intensive Only-inference

👍4❤1🔥1🤯1🤩1

245 voters7.26K views13:19

AI with Papers - Artificial Intelligence & Deep Learning

0:04

This media is not supported in your browser

VIEW IN TELEGRAM

⚽ FIFA 3D Human Pose ⚽

👉#FIFA WorldPose is a novel dataset for multi-person global pose estimation in the wild, featuring footage from the 2022 World Cup. 2.5M+ annotation, released 💙

👉Review https://t.ly/kvGVQ
👉Paper arxiv.org/pdf/2501.02771
👉Project https://lnkd.in/d5hFWpY2
👉Dataset https://lnkd.in/dAphJ9WA

🤩7❤6🤯3👏1💩1😍1🍾1

7.95K views07:40

AI with Papers - Artificial Intelligence & Deep Learning

0:03

This media is not supported in your browser

VIEW IN TELEGRAM

🔥 Depth Any Camera (SOTA) 🔥

👉DAC is a novel and powerful zero-shot metric depth estimation framework that extends a perspective-trained model to effectively handle cams with varying FoVs (including large fisheye & 360◦). Code announced (not available yet)💙

👉Review https://t.ly/1qz4F
👉Paper arxiv.org/pdf/2501.02464
👉Project yuliangguo.github.io/depth-any-camera/
👉Repo github.com/yuliangguo/depth_any_camera

👍12🔥5🤩4❤2😍1

8.67K views07:33

AI with Papers - Artificial Intelligence & Deep Learning

This media is not supported in your browser

VIEW IN TELEGRAM

❤️‍🔥 Uncommon object in #3D ❤️‍🔥

👉#META releases uCO3D, a new object-centric dataset for 3D AI. The largest publicly-available collection of HD videos of objects with 3D annotations that ensures full-360◦ coverage. Code & data under CCA 4.0💙

👉Review https://t.ly/Z_tvA
👉Paper https://arxiv.org/pdf/2501.07574
👉Project https://uco3d.github.io/
👉Repo github.com/facebookresearch/uco3d

❤11⚡2😍2👍1👏1🤩1🍾1

6.83K views09:02

AI with Papers - Artificial Intelligence & Deep Learning

This media is not supported in your browser

VIEW IN TELEGRAM

🏆Universal Detector-Free Match🏆

👉MatchAnything: novel detector-free universal matcher across unseen real-world single/cross-modality domains. Same weights for everything. Code announced, to be released 💙

👉Review https://t.ly/sx92L
👉Paper https://lnkd.in/dWwRwGyY
👉Project https://lnkd.in/dCwb2Yte
👉Repo https://lnkd.in/dnUXYzQ5

❤8🤯7🔥4👏3⚡1🤩1😍1🍾1

8K viewsedited 12:42

AI with Papers - Artificial Intelligence & Deep Learning

🆘 Help: Looking for Outstanding Speakers 🆘

👉Who would you suggest as a speaker for your ideal conference on AI (CV, LLM, RAG, ML, HW Optimization, AI & Space, etc.)? Only “hardcore” technical talks, no commercial at all. Please comment here with name, topic and affiliation (es: Paul Gascoigne, Computer Vision & Football, Scotland Team).

⭐Guaranteed tickets & more for the suggestions that will become invited speakers ;)

❤5🔥4👍3

6.86K views14:58

AI with Papers - Artificial Intelligence & Deep Learning

0:03

This media is not supported in your browser

VIEW IN TELEGRAM

🧞‍♂️Omni-RGPT: SOTA MLLM Understanding🧞‍♂️

👉 #NVIDIA presents Omni-RGPT, MLLM for region-level comprehension for both images & videos. New SOTA on image/video-based commonsense reasoning.

👉Review https://t.ly/KHnQ7
👉Paper arxiv.org/pdf/2501.08326
👉Project miranheo.github.io/omni-rgpt/
👉Repo TBA soon

🔥10❤3🍾2⚡1👍1👏1

7.66K viewsedited 07:55

AI with Papers - Artificial Intelligence & Deep Learning

This media is not supported in your browser

VIEW IN TELEGRAM

🔥 GAGA: Group Any Gaussians 🔥

👉GAGA is a framework that reconstructs and segments open-world 3D scenes by leveraging inconsistent 2D masks predicted by zero-shot segmentation models. Code available, recently updated💙

👉Review https://t.ly/Nk_jT
👉Paper www.gaga.gallery/static/pdf/Gaga.pdf
👉Project www.gaga.gallery/
👉Repo github.com/weijielyu/Gaga

🔥11❤3👍2🤩1

7.55K views13:09

AI with Papers - Artificial Intelligence & Deep Learning

This media is not supported in your browser

VIEW IN TELEGRAM

🎁Free Book: LLM Foundations🎁

👉A fully free book just released on arXiv to outline the basic concepts of #LLMs and related techniques with a focus on the foundational aspects.

✅Chapter 1: basics of pre-training
✅Chapter 2: gen-models & LLMs
✅Chapter 3: prompting methods
✅Chapter 4: alignment methods

👉If you have any background in ML, along with a certain understanding of stuff like Transformers, this book will be "smooth". However, even without this prior knowledge, it is still perfectly fine because the contents of each chapter are self-contained.

👉Review https://t.ly/9LGCa
👉Book https://lnkd.in/d3VkswZf

❤17🔥6👏3😍1

7.78K viewsedited 08:03

AI with Papers - Artificial Intelligence & Deep Learning

This media is not supported in your browser

VIEW IN TELEGRAM

🏄‍♀️ GSTAR: Gaussian Surface Tracking 🏄‍♀️

👉ETH Zurich unveils GSTAR, a novel framework for photo-realistic rendering, surface reconstruction, and 3D tracking for dynamic scenes while handling topology changes. Code announced💙

👉Review https://t.ly/udpMq
👉Paper arxiv.org/pdf/2501.10283
👉Project chengwei-zheng.github.io/GSTAR/
👉Repo TBA

🔥8🤩3👍2😍2❤1🤯1

6.98K viewsedited 13:19

AI with Papers - Artificial Intelligence & Deep Learning

This media is not supported in your browser

VIEW IN TELEGRAM

🧽 Diffusion Video Inpainting 🧽

👉#Alibaba unveils a technical report about DiffuEraser, a video inpainting model based on stable diffusion, designed to fill masked regions with greater details and more coherent structures. Code & weights released under Apache💙

👉Review https://t.ly/7rEll
👉Paper arxiv.org/pdf/2501.10018
👉Project lixiaowen-xw.github.io/DiffuEraser-page/
👉Repo github.com/lixiaowen-xw/DiffuEraser

🔥14❤3👍2⚡1👏1

15.1K views09:26

About

Blog

Apps

Platform