AI with Papers - Artificial Intelligence & Deep Learning
15.4K subscribers
139 photos
253 videos
14 files
1.33K links
All the AI with papers. Every day fresh updates about #DeepLearning, #MachineLearning, LLMs and #ComputerVision

Curated by Alessandro Ferrari | https://www.linkedin.com/in/visionarynet/

#artificialintelligence #machinelearning #ml #AI
Download Telegram
1_mR__xCOs_j8c5A0hMq3B5Q.gif
21.7 MB
πŸ”₯SOTA Detection w/ DINOv3πŸ”₯

πŸ‘‰DEIMv2 is the evolution of DEIM framework while leveraging DINOv3. Various models, from an ultra-light version up to S, M, L, & X for a wide range of scenarios. Across these variants, DEIMv2 achieves SOTA. Repo Apache2.0πŸ’™

πŸ‘‰Review https://t.ly/P7jEH
πŸ‘‰Paper arxiv.org/pdf/2509.20787
πŸ‘‰Repo github.com/Intellindust-AI-Lab/DEIMv2
πŸ‘‰Project intellindust-ai-lab.github.io/projects/DEIMv2
πŸ”₯12❀6πŸ‘1
This media is not supported in your browser
VIEW IN TELEGRAM
πŸ€–Real-time Interactive VideoπŸ€–

πŸ‘‰LONGLIVE by #Nvidia is a frame-level autoregressive framework for real-time & interactive long video generation. LONGLIVE accepts sequential user prompts and generates corresponding videos in real time. Repo under non-commercial licenseπŸ’™

πŸ‘‰Review https://t.ly/jJkdY
πŸ‘‰Paper arxiv.org/pdf/2509.22622
πŸ‘‰Project nvlabs.github.io/LongLive/
πŸ‘‰Repo github.com/NVlabs/LongLive
πŸ€—huggingface.co/Efficient-Large-Model/LongLive-1.3B
πŸ”₯8❀1
This media is not supported in your browser
VIEW IN TELEGRAM
πŸ‘” Universal Image Restoration πŸ‘”

πŸ‘‰LucidFlux by HKUSTGZ is the universal image restoration framework built on a large-scale diffusion transformer that delivers photorealistic restorations of real-world low-quality (LQ) images, outperforming SOTA diffusion-based models across diverse degradations. Repo under custom Non-Commercial LicenseπŸ’™

πŸ‘‰Review https://t.ly/Z5cA3
πŸ‘‰Paper https://arxiv.org/pdf/2509.22414
πŸ‘‰Project https://w2genai-lab.github.io/LucidFlux/
πŸ‘‰Repo https://github.com/W2GenAI-Lab/LucidFlux
πŸ”₯14❀4πŸ‘1
This media is not supported in your browser
VIEW IN TELEGRAM
πŸ‘©β€πŸ¦±Physical-Hair DiffusionπŸ‘©β€πŸ¦±

πŸ‘‰CONTROLHAIR is novel hybrid framework that integrates a physics simulator with conditional video diffusion to enable controllable dynamic hair rendering. Repo announcedπŸ’™

πŸ‘‰Review https://t.ly/78LHr
πŸ‘‰Paper https://lnkd.in/epm-A9Fq
πŸ‘‰Project https://lnkd.in/evsjz298
πŸ‘‰Repo TBA
❀7πŸ”₯2πŸ‘1
This media is not supported in your browser
VIEW IN TELEGRAM
πŸ”©Code-Agentic EducationπŸ”©

πŸ‘‰Show Lab unveils Code2Video: agentic, code-centric framework that generates HQ educational videos from knowledge points. Clarity, coherence & reproducibility. Repo under MITπŸ’™

πŸ‘‰Review https://t.ly/Fv4LJ
πŸ‘‰Paper https://arxiv.org/pdf/2510.01174
πŸ‘‰Repo https://github.com/showlab/Code2Video/
πŸ‘‰Project https://showlab.github.io/Code2Video/
❀8πŸ”₯2
epi_11 (online-video-cutter.com).mp4
1.1 MB
🎷🎷 Clink! Chop! Thud! 🎷🎷

πŸ‘‰Sounding Object Detection: while an environment may contain many objects, only a few are directly involved in producing sound during an interaction. This model detects the sounding object in a video. Code/Data announced πŸ’™

πŸ‘‰Review https://t.ly/VK_1h
πŸ‘‰Paper https://lnkd.in/depNjVXm
πŸ‘‰Project https://lnkd.in/dF63EZFG
πŸ‘‰Repo TBA
πŸ”₯6❀2😍1
πŸ‘‰ A proof I'm not a bot...

My (short) interview to one of the biggest Italian media: AI in 2016, HPC / Quantum and how I created my startup: https://www.linkedin.com/posts/visionarynet_ai-itw25-ai-activity-7381215486115643392-t7an

Thanks for the support (and of course a new paper coming in a few hours)
❀19πŸ”₯8πŸ‘4😍3⚑1
This media is not supported in your browser
VIEW IN TELEGRAM
🎺Visual Grounding RVOS🎺

πŸ‘‰ReferDINO is a strong RVOS model that inherits region-level vision-language alignment from foundational visual grounding models, and is further endowed with pixel-level dense perception & cross-modal spatio-temporal reasoning. Code, Demo & checkpointsπŸ’™

πŸ‘‰Review https://t.ly/rOdkP
πŸ‘‰Paper https://lnkd.in/efuAFQdE
πŸ‘‰Project https://lnkd.in/dK3wMZqv
πŸ‘‰Repo https://lnkd.in/d3i2PsNF
πŸ”₯8❀3πŸ‘1
This media is not supported in your browser
VIEW IN TELEGRAM
πŸ’„Pixel-Perfect Depth (SOTA)πŸ’„

πŸ‘‰Pixel-Perfect Depth is a mono-depth estimation model with pixel-space diffusion transformers. New SOTA. Repo under Apache 2.0πŸ’™

πŸ‘‰Review https://t.ly/75PGo
πŸ‘‰Paper https://lnkd.in/d8wxFpyY
πŸ‘‰Project https://lnkd.in/dV5HhsqH
πŸ‘‰Repo https://lnkd.in/d9JKFBJq
πŸ‘‰Demo https://lnkd.in/d3wBkKJ9
πŸ”₯16🀯5❀3
This media is not supported in your browser
VIEW IN TELEGRAM
↗️ TrackVLA++ Visual Trackingβ†˜οΈ

πŸ‘‰TrackVLA++ is a novel Vision-Language-Action model that incorporates spatial reasoning and target identification memory, enabling SOTA performance in both long-horizon and highly crowded tracking scenarios. Model announcedπŸ’™

πŸ‘‰Review https://t.ly/ruYzc
πŸ‘‰Paper https://arxiv.org/pdf/2510.07134
πŸ‘‰Project pku-epic.github.io/TrackVLA-plus-plus-Web/
πŸ‘‰Repo TBA
πŸ”₯5❀1🀣1
This media is not supported in your browser
VIEW IN TELEGRAM
🫧 Detect Anything via MLLM 🫧

πŸ‘‰Rex-Omni is a 3B-multimodal model that unifies visual perception tasks, including object detection, OCR, pointing, key-pointing & visual prompting into a single next point prediction framework. Impressive results. Repo under IDEA License 1.0πŸ’™

πŸ‘‰Review https://t.ly/DCTk_
πŸ‘‰Paper https://lnkd.in/d4VDD-9j
πŸ‘‰Project https://lnkd.in/d6unEyvq
πŸ‘‰Repo https://lnkd.in/dkYJFe-x
1πŸ”₯19❀10πŸ‘1πŸ‘1
This media is not supported in your browser
VIEW IN TELEGRAM
πŸ«™Universal Feature Up-SamplingπŸ«™

πŸ‘‰AnyUp is a novel method for feature up-sampling that can be applied to ANY vision feature at ANY resolution, without encoder-specific training: inference-time feature-agnostic up-sampling architecture to improve up-sampling quality. Repo CC-4.0πŸ’™

πŸ‘‰Review https://t.ly/HvEw9
πŸ‘‰Paper https://arxiv.org/pdf/2510.12764
πŸ‘‰Project https://wimmerth.github.io/anyup/
πŸ‘‰Repo https://github.com/wimmerth/anyup
❀14πŸ”₯6πŸ‘1
This media is not supported in your browser
VIEW IN TELEGRAM
πŸ¦„ City-Tour -> Simulation πŸ¦„

πŸ‘‰UrbanVerse is a novel system to convert real-world urban scenes from city-tour videos into physics-aware, interactive simulation environments, enabling scalable robot learning in urban spaces with real-world generalization. Repo & Data announced πŸ’™

πŸ‘‰Review https://t.ly/UvXNS
πŸ‘‰Paper https://arxiv.org/pdf/2510.15018
πŸ‘‰Project https://urbanverseproject.github.io/
πŸ‘‰Repo TBA
❀11🀩2πŸ‘1πŸ”₯1😒1
🌡All-in-One Dense Keypoints🌡

πŸ‘‰DeepDetect is a novel all-in-one, dense keypoints detector that unifies the strengths of SIFT, ORB, BRISK, FAST, AGAST, Harris, Shi-Tomasi, Canny & Sobel into a neural net. DAMN ROMANTIC. Repo under MITπŸ’™

πŸ‘‰Review https://t.ly/VKGct
πŸ‘‰Paper https://arxiv.org/pdf/2510.17422
πŸ‘‰Repo https://github.com/saktx/DeepDetect
❀14πŸ”₯3πŸ‘1
This media is not supported in your browser
VIEW IN TELEGRAM
πŸ”₯ SAM 2++: Track Anything πŸ”₯

πŸ‘‰SAM 2++ is a novel unified model towards tracking at any granularity, including masks, boxes, and points. Impressive results but no code announced😒

πŸ‘‰Review https://t.ly/I392_
πŸ‘‰Paper arxiv.org/pdf/2510.18822
πŸ‘‰Project tracking-any-granularity.github.io/
πŸ‘‰Repo :(
❀12πŸ”₯6πŸ‘1
This media is not supported in your browser
VIEW IN TELEGRAM
🏜️Omni Driving Models🏜️

πŸ‘‰OmniNWM is a unified panoramic navigation world model that advances autonomous driving by jointly generating multi-modal states (RGB, semantics, depth, 3D occupancy), enabling precise action control & facilitating closed-loop evaluation through occupancy-based dense rewards. Repo under Apache 2.0πŸ’™

πŸ‘‰Review https://t.ly/ktXvz
πŸ‘‰Paper https://lnkd.in/eFKSZnrc
πŸ‘‰Project https://lnkd.in/eSDfccv8
πŸ‘‰Repo https://lnkd.in/efCSvjtp
πŸ”₯6❀1πŸ‘1🀩1
This media is not supported in your browser
VIEW IN TELEGRAM
🐠ITTO: Protocol for Dynamic Tracking🐠

πŸ‘‰ITTO by Caltech is a novel long-range tracking benchmark suite for evaluating and diagnosing tracking methods on complex and long-range motions. Repo under CC BY-NC 4.0πŸ’™

πŸ‘‰Review https://t.ly/tN84a
πŸ‘‰Paper https://arxiv.org/pdf/2510.19819
πŸ‘‰Project https://glab-caltech.github.io/ITTO/
πŸ‘‰Repo https://github.com/ilonadem/itto
❀5πŸ”₯1
This media is not supported in your browser
VIEW IN TELEGRAM
πŸ¦—Character Mixing GenerationπŸ¦—

πŸ‘‰MBZUAI unveils the first ever video-gen system able to preserve character ID, behavior & original style while generating plausible interactions between characters that have never coexisted - from cartoons (We Bare Bears, Tom & Jerry) to realistic humans (Mr. Bean, Young Sheldon)

πŸ‘‰Review https://t.ly/tN84a
πŸ‘‰Paper https://lnkd.in/dhKMwukv
πŸ‘‰Project https://lnkd.in/dBkJs48h
πŸ‘‰Repo https://lnkd.in/dw_uzgAk
🀩4❀1πŸ‘1πŸ‘1