AI with Papers - Artificial Intelligence & Deep Learning

🧠 Distractor-Aware SAM2 🧠

👉A novel distractor-aware memory for SAM2 and an introspection-based update strategy for VOT. Code & Dataset released💙

👉Review https://t.ly/RBRpQ
👉Paper arxiv.org/pdf/2411.17576
👉Project jovanavidenovic.github.io/dam-4-sam
👉Repo github.com/jovanavidenovic/DAM4SAM/

❤8🔥5👍2😍1🤣1

8.51K viewsedited 08:05

AI with Papers - Artificial Intelligence & Deep Learning

This media is not supported in your browser

VIEW IN TELEGRAM

🔥Distill-Any-Depth: SOTA MDE🔥

👉Distill-Any-Depth is the new SOTA monocular depth estimation model trained with a novel knowledge distillation. Authors: ZJUT, WestLake University, LZU & NTU. Source Code, pre-trained models & HF-demo released💙

👉Review https://t.ly/GBJgi
👉Paper arxiv.org/pdf/2502.19204
👉Repo https://lnkd.in/dPtxNrQh
🤗Demo https://lnkd.in/d2TMPf4b

❤12🔥5👍3👏1😍1

9.44K views14:03

AI with Papers - Artificial Intelligence & Deep Learning

This media is not supported in your browser

VIEW IN TELEGRAM

🍎FindTrack: text-driven VOS 🍎

👉Yonsei University introduces FindTrack, a novel decoupled framework that separates text-driven target ID from mask propagation. Impressive results (even under severe occlusions), new SOTA. Source Code & models to be released💙

👉Review https://t.ly/2smaF
👉Paper arxiv.org/pdf/2503.03492
👉Repo github.com/suhwan-cho/FindTrack

🔥10🤯4👍3❤2😍1

9.49K views10:55

AI with Papers - Artificial Intelligence & Deep Learning

This media is not supported in your browser

VIEW IN TELEGRAM

📒 Moving-Camera Diffusion 📒

👉Tencent unveils TrajectoryCrafter, a novel approach to redirect camera trajectories for monocular videos. Impressive results, the future of commercial #adv. Code & Demo released💙

👉Review https://t.ly/L-IoR
👉Paper https://arxiv.org/pdf/2503.05638
👉Project https://trajectorycrafter.github.io/
👉Repo github.com/TrajectoryCrafter/TrajectoryCrafter
🤗Demo https://huggingface.co/spaces/Doubiiu/TrajectoryCrafter

🔥12🤩4❤2👍1👏1

7.98K views15:01

AI with Papers - Artificial Intelligence & Deep Learning

💙 Announcing #Py4AI 2025 💙

👉 The second edition of Py4AI conference is official! An all-day, fully free, event for #AI & #Python lovers.

𝐓𝐡𝐞 𝐟𝐢𝐫𝐬𝐭 𝐛𝐚𝐭𝐜𝐡 𝐨𝐟 𝐬𝐩𝐞𝐚𝐤𝐞𝐫𝐬:
🚀Dana Aubakirova | Hugging Face🤗
🚀Yunhao Liu & Ruoya Sheng | ByteDance🔥
🚀Alice Casiraghi | 🌏🌎🌍
🚀Luca Arrotta, PhD | Datapizza🍕
🚀Valeria Zuccoli | Bettini Srl
🚀Mirco Planamente | ARGO Vision
🚀Daniele Zonca | Red Hat

👉 Info & registration: https://t.ly/37wWj

LinkedIn Login, Sign in | LinkedIn

❤7⚡1👍1🔥1🤩1

7.25K viewsedited 08:31

AI with Papers - Artificial Intelligence & Deep Learning

0:03

This media is not supported in your browser

VIEW IN TELEGRAM

🎯RexSeek: Referring Any Object🎯

👉Novel referring detection model based on multimodal LLM to precisely locate objects based on user-input natural language. Model specialization on humans. Code released 💙

👉Review https://shorturl.at/CGsT2
👉Paper arxiv.org/pdf/2503.08507
👉Code github.com/IDEA-Research/RexSeek

👍17❤6👏4🔥2

7.4K viewsedited 12:53

AI with Papers - Artificial Intelligence & Deep Learning

This media is not supported in your browser

VIEW IN TELEGRAM

🐶OVTR: E2E Transformer MOT🐶

👉HUST University proposes OVTR (End-to-End Open-Vocabulary Multiple Object Tracking with TRansformer), the first end-to-end open-vocabulary tracker that models motion, appearance, and category simultaneously. Source Code released under MIT💙

👉Review https://t.ly/K3ASX
👉Paper arxiv.org/pdf/2503.10616
👉Code https://github.com/jinyanglii/OVTR

🔥11❤2👍1😍1

7.34K views11:07

AI with Papers - Artificial Intelligence & Deep Learning

This media is not supported in your browser

VIEW IN TELEGRAM

🫀HyperFast Mycardium tracking🫀

👉Norwegian institutes unveil MyoTracker, a low-complexity architecture (0.3M params) for point tracking in echocardiography. Built on CoTracker2, it provides point predictions for the entire sequence in a single step. Code released under non commercial license💙

👉Review https://t.ly/6wo8q
👉Paper https://arxiv.org/pdf/2503.10431
👉Code https://github.com/artemcher/myotracker

👍11❤7🔥1

7.73K viewsedited 14:01

AI with Papers - Artificial Intelligence & Deep Learning

0:03

This media is not supported in your browser

VIEW IN TELEGRAM

🍾 6D Tracking & Pose SOTA 🍾

👉ČVUT unveils the new SOTA in RGB 6D pose estimation and tracking. Suitable for ego-clips & 7-axis robo-manipulation. Code under MIT💙

👉Review https://t.ly/pSqFR
👉Paper arxiv.org/pdf/2503.10307
👉Code github.com/ponimatkin/freepose

👏7❤3

7.78K viewsedited 07:54

AI with Papers - Artificial Intelligence & Deep Learning

This media is not supported in your browser

VIEW IN TELEGRAM

🖲️ VGG Transformer 🖲️

👉VGGT by VGG & #META (#CVPR2025) is a feed-forward neural net. that directly infers all key 3D attributes of a scene within seconds. Code released💙

👉Review https://t.ly/WoWXL
👉Paper https://arxiv.org/pdf/2503.11651
👉Project https://vgg-t.github.io/
👉Code github.com/facebookresearch/vggthttps://t.ly/WoWXL

🤯26👍11🔥6❤2🤩1

7.81K views13:25

AI with Papers - Artificial Intelligence & Deep Learning

This media is not supported in your browser

VIEW IN TELEGRAM

🧸 Occluded 3D Reconstruction 🧸

👉Oxford unveils a novel 3D generative model to reconstruct 3D objects from partial observations. Code (TBR), demo, model on HF💙

👉Review https://t.ly/Lr5D7
👉Paper arxiv.org/pdf/2503.13439
👉Project sm0kywu.github.io/Amodal3R/
🤗huggingface.co/spaces/Sm0kyWu/Amodal3R

👍6🔥4❤2🤯2👏1

7.87K viewsedited 07:54

AI with Papers - Artificial Intelligence & Deep Learning

🌱 #Py4AI: line-up is official 🌱

👉Last week we announced the first part of our incredible line-up for PY4AI 2025. It's time to disclose the second one and drive you crazy👇

𝐓𝐡𝐞 𝐬𝐞𝐜𝐨𝐧𝐝 𝐛𝐚𝐭𝐜𝐡 𝐨𝐟 𝐬𝐩𝐞𝐚𝐤𝐞𝐫𝐬:
🔥Alfredo Canziani | New York University
🔥Fanny Bouton | OVHcloud
🔥Full list: https://t.ly/JJP8B

🔥3❤1🤯1

8.2K views13:15

AI with Papers - Artificial Intelligence & Deep Learning

This media is not supported in your browser

VIEW IN TELEGRAM

🧞 IMPOSSIBLE Videos 🧞

👉IPV-Bench: counterfactual and anti-reality scenes impossible in real world. A novel challenge designed to evaluate and foster progress in video understanding and generation. Code & 🤗-Data 💙

👉Review https://t.ly/D7jhm
👉Paper arxiv.org/pdf/2503.14378
👉Project showlab.github.io/Impossible-Videos/
👉Repo github.com/showlab/Impossible-Videos

🔥6❤2👍2🤩1

8.34K views08:17

AI with Papers - Artificial Intelligence & Deep Learning

This media is not supported in your browser

VIEW IN TELEGRAM

🥎LLM Spatial Understanding🥎

👉SpatialLM by Manycore: novel LLM designed to process 3D point cloud data and generate structured 3D scene understanding outputs. Code, model & data 💙

👉Review https://t.ly/ejr1s
👉Project manycore-research.github.io/SpatialLM/
👉Code github.com/manycore-research/SpatialLM
🤗Models https://huggingface.co/manycore-research

🔥30❤4⚡2🤯2😍2

10.1K views08:47

AI with Papers - Artificial Intelligence & Deep Learning

AI with Papers - Artificial Intelligence & Deep Learning pinned a GIF

21:23

AI with Papers - Artificial Intelligence & Deep Learning

0:04

This media is not supported in your browser

VIEW IN TELEGRAM

🙀3D MultiModal Memory🙀

👉M3 is a novel framework by UCSD & #NVIDIA for rendering 3D scenes w/ RGB & foundation model embeddings. Rich spatial & semantic understanding via novel memory system designed to retain multimodal info through videos

👉Review https://t.ly/OrXZO
👉Paper arxiv.org/pdf/2503.16413
👉Project https://lnkd.in/dXAZ97KH
👉Repo https://lnkd.in/dWvunCET

🔥10❤4👍1👏1

8.01K viewsedited 14:52

AI with Papers - Artificial Intelligence & Deep Learning

0:05

This media is not supported in your browser

VIEW IN TELEGRAM

🔥 Dereflection Any Image 🔥

👉SJTU & #Huawei unveils DAI, novel diffusion-based framework able to recover from a wide range of reflection types. One-step diffusion with deterministic outputs & fast inference. Inference, pretrained models & training released💙

👉Review https://t.ly/PDA9K
👉Paper https://arxiv.org/pdf/2503.17347
👉Project abuuu122.github.io/DAI.github.io/
👉Repo github.com/Abuuu122/Dereflection-Any-Image

🔥21🤯5👏4❤2👍2😍1

8.43K views07:58

AI with Papers - Artificial Intelligence & Deep Learning

🦎 Scaling Vision to 4K🦎

👉PS3 by #Nvidia (+UC Berkeley) to scale-up CLIP-style vision pre-training to 4K with *near-constant* cost. Encoding LR global image and selectively processes only informative HR regions. Impressive work. Code/weights & 🤗 announced💙

👉Review https://t.ly/WN479
👉Paper https://lnkd.in/ddWq8UpX
👉Project https://lnkd.in/dMkTY8-k
👉Repo https://lnkd.in/d9YSB6yv

🔥14❤4👍2👏1

8.02K viewsedited 07:37

AI with Papers - Artificial Intelligence & Deep Learning

0:03

This media is not supported in your browser

VIEW IN TELEGRAM

🏓LATTE-MV: #3D Table Tennis🏓

👉UC Berkeley unveils at #CVPR2025 a novel system for reconstructing monocular video of table tennis in 3D with uncertainty-aware controller that anticipates opponent actions. Code & Dataset announced, to be released💙

👉Review https://t.ly/qPMOU
👉Paper arxiv.org/pdf/2503.20936
👉Project sastry-group.github.io/LATTE-MV/
👉Repo github.com/sastry-group/LATTE-MV

🔥8👍2👏1🤯1

7.91K views13:49

AI with Papers - Artificial Intelligence & Deep Learning

This media is not supported in your browser

VIEW IN TELEGRAM

🌳MSVA Zero-Shot Multi-View🌳

👉Niantic unveils MVSA, novel Multi-View Stereo Architecture to work anywhere by generalizing across diverse domains & depth ranges. Highly accurate & 3D-consistent depths. Code & models announced💙

👉Review https://t.ly/LvuTh
👉Paper https://arxiv.org/pdf/2503.22430
👉Project https://nianticlabs.github.io/mvsanywhere/
👉Repo https://lnkd.in/ddQz9eps

🔥12👍2👏1

7.87K viewsedited 13:54

AI with Papers - Artificial Intelligence & Deep Learning

This media is not supported in your browser

VIEW IN TELEGRAM

🐟Segment Any Motion in Video🐟

👉From CVPR2025 a novel approach for moving object segmentation that combines DINO-based semantic features and SAM2. Code under MIT license💙

👉Review https://t.ly/4aYjJ
👉Paper arxiv.org/pdf/2503.22268
👉Project motion-seg.github.io/
👉Repo github.com/nnanhuang/SegAnyMo

🔥5❤3👍3🤯1

7.61K viewsedited 07:13

About

Blog

Apps

Platform