AI with Papers - Artificial Intelligence & Deep Learning
15.2K subscribers
135 photos
247 videos
13 files
1.31K links
All the AI with papers. Every day fresh updates about #DeepLearning, #MachineLearning, LLMs and #ComputerVision

Curated by Alessandro Ferrari | https://www.linkedin.com/in/visionarynet/

#artificialintelligence #machinelearning #ml #AI
Download Telegram
This media is not supported in your browser
VIEW IN TELEGRAM
🍄 Open-MLLMs Self-Driving 🍄

👉OpenEMMA: a novel open-source e2e framework based on MLLMs (via Chain-of-Thought reasoning). Effectiveness, generalizability, and robustness across a variety of challenging driving scenarios. Code released under Apache 2.0💙

👉Review https://t.ly/waLZI
👉Paper https://arxiv.org/pdf/2412.15208
👉Code https://github.com/taco-group/OpenEMMA
12👍5🔥5👏1😍1
This media is not supported in your browser
VIEW IN TELEGRAM
🔄️ Orient Anything in 3D 🔄️

👉Orient Anything is a novel robust image-based object orientation estimation model. By training on 2M rendered labeled images, it achieves strong zero-shot generalization in the wild. Code released💙

👉Review https://t.ly/ro5ep
👉Paper arxiv.org/pdf/2412.18605
👉Project orient-anything.github.io/
👉Code https://lnkd.in/d_3k6Nxz
👍97🔥31🤩1
This media is not supported in your browser
VIEW IN TELEGRAM
TOP 10 Papers you loved - 2024

👉Here the list of my posts you liked the most in 2024, thank you all 💙

𝐏𝐚𝐩𝐞𝐫𝐬:
"Look Ma, no markers"
T-Rex 2 Detector
Models at Any Resolution

👉The full list with links: https://t.ly/GvQVy
12🔥4👍1🤩1😍1
This media is not supported in your browser
VIEW IN TELEGRAM
🌳 HD Video Object Insertion 🌳

👉VideoAnydoor is a novel zero-shot video object insertion #AI with high-fidelity detail preservation and precise motion control. All-in-one: video VTON, face swapping, logo insertion, multi-region editing, etc.

👉Review https://t.ly/hyvRq
👉Paper arxiv.org/pdf/2501.01427
👉Project videoanydoor.github.io/
👉Repo TBA
🔥82💩2👍1🤩1😍1
What is your favorite source for the AI updates?
Final Results
32%
Linkedin
4%
Instagram
3%
Reddit
52%
Telegram
👍11🔥21😍1
This media is not supported in your browser
VIEW IN TELEGRAM
🥮 SOTA probabilistic tracking🥮

👉ProTracker is a novel framework for robust and accurate long-term dense tracking of arbitrary points in videos. Code released under CC Attribution-NonCommercial💙

👉Review https://t.ly/YY_PH
👉Paper https://arxiv.org/pdf/2501.03220
👉Project michaelszj.github.io/protracker/
👉Code github.com/Michaelszj/pro-tracker
6🔥5👍2🤩2👏1
This media is not supported in your browser
VIEW IN TELEGRAM
🧤World-Space Ego 3D Hands🧤

👉The Imperial College unveils HaWoR, a novel world-space 3D hand motion estimation for egocentric videos. The new SOTA on both cam pose estimation & hand motion reconstruction. Code under Attribution-NC-ND 4.0 Int.💙

👉Review https://t.ly/ozJn7
👉Paper arxiv.org/pdf/2501.02973
👉Project hawor-project.github.io/
👉Code github.com/ThunderVVV/HaWoR
🔥4😢1🤩1
🔥 "Nuclear" AI vs. Hyper-Cheap Inference 🔥

What do you expect in 2025 after the #Nvidia announcements at CES 2025? Free to comment :)
Anonymous Poll
24%
🤲Portabile Training Workstation
34%
⚛️Nuclear energy for AI training
33%
🖲️Cheaper Only-inference devices
9%
💰Cloud-intensive Only-inference
👍41🔥1🤯1🤩1
This media is not supported in your browser
VIEW IN TELEGRAM
FIFA 3D Human Pose

👉#FIFA WorldPose is a novel dataset for multi-person global pose estimation in the wild, featuring footage from the 2022 World Cup. 2.5M+ annotation, released 💙

👉Review https://t.ly/kvGVQ
👉Paper arxiv.org/pdf/2501.02771
👉Project https://lnkd.in/d5hFWpY2
👉Dataset https://lnkd.in/dAphJ9WA
🤩76🤯3👏1💩1😍1🍾1
This media is not supported in your browser
VIEW IN TELEGRAM
🔥 Depth Any Camera (SOTA) 🔥

👉DAC is a novel and powerful zero-shot metric depth estimation framework that extends a perspective-trained model to effectively handle cams with varying FoVs (including large fisheye & 360◦). Code announced (not available yet)💙

👉Review https://t.ly/1qz4F
👉Paper arxiv.org/pdf/2501.02464
👉Project yuliangguo.github.io/depth-any-camera/
👉Repo github.com/yuliangguo/depth_any_camera
👍12🔥5🤩42😍1
This media is not supported in your browser
VIEW IN TELEGRAM
❤️‍🔥 Uncommon object in #3D ❤️‍🔥

👉#META releases uCO3D, a new object-centric dataset for 3D AI. The largest publicly-available collection of HD videos of objects with 3D annotations that ensures full-360◦ coverage. Code & data under CCA 4.0💙

👉Review https://t.ly/Z_tvA
👉Paper https://arxiv.org/pdf/2501.07574
👉Project https://uco3d.github.io/
👉Repo github.com/facebookresearch/uco3d
112😍2👍1👏1🤩1🍾1
This media is not supported in your browser
VIEW IN TELEGRAM
🏆Universal Detector-Free Match🏆

👉MatchAnything: novel detector-free universal matcher across unseen real-world single/cross-modality domains. Same weights for everything. Code announced, to be released 💙

👉Review https://t.ly/sx92L
👉Paper https://lnkd.in/dWwRwGyY
👉Project https://lnkd.in/dCwb2Yte
👉Repo https://lnkd.in/dnUXYzQ5
8🤯7🔥4👏31🤩1😍1🍾1
🆘 Help: Looking for Outstanding Speakers 🆘

👉Who would you suggest as a speaker for your ideal conference on AI (CV, LLM, RAG, ML, HW Optimization, AI & Space, etc.)? Only “hardcore” technical talks, no commercial at all. Please comment here with name, topic and affiliation (es: Paul Gascoigne, Computer Vision & Football, Scotland Team).

Guaranteed tickets & more for the suggestions that will become invited speakers ;)
5🔥4👍3
This media is not supported in your browser
VIEW IN TELEGRAM
🧞‍♂️Omni-RGPT: SOTA MLLM Understanding🧞‍♂️

👉 #NVIDIA presents Omni-RGPT, MLLM for region-level comprehension for both images & videos. New SOTA on image/video-based commonsense reasoning.

👉Review https://t.ly/KHnQ7
👉Paper arxiv.org/pdf/2501.08326
👉Project miranheo.github.io/omni-rgpt/
👉Repo TBA soon
🔥103🍾21👍1👏1
This media is not supported in your browser
VIEW IN TELEGRAM
🔥 GAGA: Group Any Gaussians 🔥

👉GAGA is a framework that reconstructs and segments open-world 3D scenes by leveraging inconsistent 2D masks predicted by zero-shot segmentation models. Code available, recently updated💙

👉Review https://t.ly/Nk_jT
👉Paper www.gaga.gallery/static/pdf/Gaga.pdf
👉Project www.gaga.gallery/
👉Repo github.com/weijielyu/Gaga
🔥113👍2🤩1
This media is not supported in your browser
VIEW IN TELEGRAM
🎁Free Book: LLM Foundations🎁

👉A fully free book just released on arXiv to outline the basic concepts of #LLMs and related techniques with a focus on the foundational aspects.

Chapter 1: basics of pre-training
Chapter 2: gen-models & LLMs
Chapter 3: prompting methods
Chapter 4: alignment methods

👉If you have any background in ML, along with a certain understanding of stuff like Transformers, this book will be "smooth". However, even without this prior knowledge, it is still perfectly fine because the contents of each chapter are self-contained.

👉Review https://t.ly/9LGCa
👉Book https://lnkd.in/d3VkswZf
17🔥6👏3😍1
This media is not supported in your browser
VIEW IN TELEGRAM
🏄‍♀️ GSTAR: Gaussian Surface Tracking 🏄‍♀️

👉ETH Zurich unveils GSTAR, a novel framework for photo-realistic rendering, surface reconstruction, and 3D tracking for dynamic scenes while handling topology changes. Code announced💙

👉Review https://t.ly/udpMq
👉Paper arxiv.org/pdf/2501.10283
👉Project chengwei-zheng.github.io/GSTAR/
👉Repo TBA
🔥8🤩3👍2😍21🤯1
This media is not supported in your browser
VIEW IN TELEGRAM
🧽 Diffusion Video Inpainting 🧽

👉#Alibaba unveils a technical report about DiffuEraser, a video inpainting model based on stable diffusion, designed to fill masked regions with greater details and more coherent structures. Code & weights released under Apache💙

👉Review https://t.ly/7rEll
👉Paper arxiv.org/pdf/2501.10018
👉Project lixiaowen-xw.github.io/DiffuEraser-page/
👉Repo github.com/lixiaowen-xw/DiffuEraser
🔥143👍21👏1