AI with Papers - Artificial Intelligence & Deep Learning
15.6K subscribers
145 photos
258 videos
14 files
1.35K links
All the AI with papers. Every day fresh updates about #DeepLearning #MachineLearning #LLM & #ComputerVision

Curated by Alessandro Ferrari | https://www.linkedin.com/in/visionarynet/

#AI #chatGPT
Download Telegram
This media is not supported in your browser
VIEW IN TELEGRAM
๐Ÿ’„Pixel-Perfect Depth (SOTA)๐Ÿ’„

๐Ÿ‘‰Pixel-Perfect Depth is a mono-depth estimation model with pixel-space diffusion transformers. New SOTA. Repo under Apache 2.0๐Ÿ’™

๐Ÿ‘‰Review https://t.ly/75PGo
๐Ÿ‘‰Paper https://lnkd.in/d8wxFpyY
๐Ÿ‘‰Project https://lnkd.in/dV5HhsqH
๐Ÿ‘‰Repo https://lnkd.in/d9JKFBJq
๐Ÿ‘‰Demo https://lnkd.in/d3wBkKJ9
๐Ÿ”ฅ17๐Ÿคฏ5โค4
This media is not supported in your browser
VIEW IN TELEGRAM
โ†—๏ธ TrackVLA++ Visual Trackingโ†˜๏ธ

๐Ÿ‘‰TrackVLA++ is a novel Vision-Language-Action model that incorporates spatial reasoning and target identification memory, enabling SOTA performance in both long-horizon and highly crowded tracking scenarios. Model announced๐Ÿ’™

๐Ÿ‘‰Review https://t.ly/ruYzc
๐Ÿ‘‰Paper https://arxiv.org/pdf/2510.07134
๐Ÿ‘‰Project pku-epic.github.io/TrackVLA-plus-plus-Web/
๐Ÿ‘‰Repo TBA
๐Ÿ”ฅ6โค1๐Ÿคฃ1
This media is not supported in your browser
VIEW IN TELEGRAM
๐Ÿซง Detect Anything via MLLM ๐Ÿซง

๐Ÿ‘‰Rex-Omni is a 3B-multimodal model that unifies visual perception tasks, including object detection, OCR, pointing, key-pointing & visual prompting into a single next point prediction framework. Impressive results. Repo under IDEA License 1.0๐Ÿ’™

๐Ÿ‘‰Review https://t.ly/DCTk_
๐Ÿ‘‰Paper https://lnkd.in/d4VDD-9j
๐Ÿ‘‰Project https://lnkd.in/d6unEyvq
๐Ÿ‘‰Repo https://lnkd.in/dkYJFe-x
1๐Ÿ”ฅ19โค11๐Ÿ‘2๐Ÿ‘1
This media is not supported in your browser
VIEW IN TELEGRAM
๐Ÿซ™Universal Feature Up-Sampling๐Ÿซ™

๐Ÿ‘‰AnyUp is a novel method for feature up-sampling that can be applied to ANY vision feature at ANY resolution, without encoder-specific training: inference-time feature-agnostic up-sampling architecture to improve up-sampling quality. Repo CC-4.0๐Ÿ’™

๐Ÿ‘‰Review https://t.ly/HvEw9
๐Ÿ‘‰Paper https://arxiv.org/pdf/2510.12764
๐Ÿ‘‰Project https://wimmerth.github.io/anyup/
๐Ÿ‘‰Repo https://github.com/wimmerth/anyup
โค16๐Ÿ”ฅ7๐Ÿ‘1
This media is not supported in your browser
VIEW IN TELEGRAM
๐Ÿฆ„ City-Tour -> Simulation ๐Ÿฆ„

๐Ÿ‘‰UrbanVerse is a novel system to convert real-world urban scenes from city-tour videos into physics-aware, interactive simulation environments, enabling scalable robot learning in urban spaces with real-world generalization. Repo & Data announced ๐Ÿ’™

๐Ÿ‘‰Review https://t.ly/UvXNS
๐Ÿ‘‰Paper https://arxiv.org/pdf/2510.15018
๐Ÿ‘‰Project https://urbanverseproject.github.io/
๐Ÿ‘‰Repo TBA
โค12๐Ÿคฉ2๐Ÿ‘1๐Ÿ”ฅ1๐Ÿ˜ข1
๐ŸŒตAll-in-One Dense Keypoints๐ŸŒต

๐Ÿ‘‰DeepDetect is a novel all-in-one, dense keypoints detector that unifies the strengths of SIFT, ORB, BRISK, FAST, AGAST, Harris, Shi-Tomasi, Canny & Sobel into a neural net. DAMN ROMANTIC. Repo under MIT๐Ÿ’™

๐Ÿ‘‰Review https://t.ly/VKGct
๐Ÿ‘‰Paper https://arxiv.org/pdf/2510.17422
๐Ÿ‘‰Repo https://github.com/saktx/DeepDetect
โค15๐Ÿ”ฅ3๐Ÿ‘2
This media is not supported in your browser
VIEW IN TELEGRAM
๐Ÿ”ฅ SAM 2++: Track Anything ๐Ÿ”ฅ

๐Ÿ‘‰SAM 2++ is a novel unified model towards tracking at any granularity, including masks, boxes, and points. Impressive results but no code announced๐Ÿ˜ข

๐Ÿ‘‰Review https://t.ly/I392_
๐Ÿ‘‰Paper arxiv.org/pdf/2510.18822
๐Ÿ‘‰Project tracking-any-granularity.github.io/
๐Ÿ‘‰Repo :(
โค12๐Ÿ”ฅ7๐Ÿ‘3
This media is not supported in your browser
VIEW IN TELEGRAM
๐Ÿœ๏ธOmni Driving Models๐Ÿœ๏ธ

๐Ÿ‘‰OmniNWM is a unified panoramic navigation world model that advances autonomous driving by jointly generating multi-modal states (RGB, semantics, depth, 3D occupancy), enabling precise action control & facilitating closed-loop evaluation through occupancy-based dense rewards. Repo under Apache 2.0๐Ÿ’™

๐Ÿ‘‰Review https://t.ly/ktXvz
๐Ÿ‘‰Paper https://lnkd.in/eFKSZnrc
๐Ÿ‘‰Project https://lnkd.in/eSDfccv8
๐Ÿ‘‰Repo https://lnkd.in/efCSvjtp
๐Ÿ”ฅ6โค1๐Ÿ‘1๐Ÿคฉ1
This media is not supported in your browser
VIEW IN TELEGRAM
๐Ÿ ITTO: Protocol for Dynamic Tracking๐Ÿ 

๐Ÿ‘‰ITTO by Caltech is a novel long-range tracking benchmark suite for evaluating and diagnosing tracking methods on complex and long-range motions. Repo under CC BY-NC 4.0๐Ÿ’™

๐Ÿ‘‰Review https://t.ly/tN84a
๐Ÿ‘‰Paper https://arxiv.org/pdf/2510.19819
๐Ÿ‘‰Project https://glab-caltech.github.io/ITTO/
๐Ÿ‘‰Repo https://github.com/ilonadem/itto
โค6๐Ÿ”ฅ1
This media is not supported in your browser
VIEW IN TELEGRAM
๐Ÿฆ—Character Mixing Generation๐Ÿฆ—

๐Ÿ‘‰MBZUAI unveils the first ever video-gen system able to preserve character ID, behavior & original style while generating plausible interactions between characters that have never coexisted - from cartoons (We Bare Bears, Tom & Jerry) to realistic humans (Mr. Bean, Young Sheldon)

๐Ÿ‘‰Review https://t.ly/tN84a
๐Ÿ‘‰Paper https://lnkd.in/dhKMwukv
๐Ÿ‘‰Project https://lnkd.in/dBkJs48h
๐Ÿ‘‰Repo https://lnkd.in/dw_uzgAk
๐Ÿคฉ5โค1๐Ÿ‘1๐Ÿ‘1
This media is not supported in your browser
VIEW IN TELEGRAM
๐ŸงทGenerative Point Tracking w/ FM๐Ÿงท

๐Ÿ‘‰Generative Point Tracker (GenPT) is a novel generative framework for modelling multi-modal trajectories. Able to capture the multi-modality in point trajectories. Repo under MIT๐Ÿ’™

๐Ÿ‘‰Review https://t.ly/MMFrt
๐Ÿ‘‰Paper https://arxiv.org/pdf/2510.20951
๐Ÿ‘‰Project mtesfaldet.net/genpt_projpage/
๐Ÿ‘‰Repo https://github.com/tesfaldet/genpt
๐Ÿ”ฅ7โค1๐Ÿ‘1๐Ÿ‘1
This media is not supported in your browser
VIEW IN TELEGRAM
๐Ÿฆ„Unified Region-Level MLLM๐Ÿฆ„

๐Ÿ‘‰PixeRefers is an unified multimodal LLM framework that supports precise, region-specific understanding in both static images and dynamic videos, overcoming the holistic, scene-level bias of prior MLLMs. SOTA results. Demo, Repo & Dataset available๐Ÿ’™

๐Ÿ‘‰Review https://t.ly/WH4dQ
๐Ÿ‘‰Paper arxiv.org/pdf/2510.23603
๐Ÿ‘‰Project circleradon.github.io/PixelRefer
๐Ÿ‘‰Repo https://github.com/alibaba-damo-academy/PixelRefer
๐Ÿ”ฅ4โค2๐Ÿคฏ2๐Ÿ‘1
This media is not supported in your browser
VIEW IN TELEGRAM
๐ŸŒฑPlanarTrack: Large Planar Tracking๐ŸŒฑ

๐Ÿ‘‰PlanarTrack is a large-scale HQ and challenging benchmark for planar tracking: 1,150 sequences with 733K+ frames, including 1,000 short-term & 150 long-term videos. Repo & Dataset available๐Ÿ’™

๐Ÿ‘‰Review https://t.ly/mYNi7
๐Ÿ‘‰Paper arxiv.org/pdf/2510.23368
๐Ÿ‘‰Repo https://lnkd.in/edb3GMyT
๐Ÿ‘‰Project https://lnkd.in/eC-hVB-U
๐Ÿ‘‰Data https://lnkd.in/eew2j4tM
๐Ÿ”ฅ11โค5๐Ÿ‘2๐Ÿ‘1
This media is not supported in your browser
VIEW IN TELEGRAM
๐Ÿ‘ขGenerative View Stitching ๐Ÿ‘ข

๐Ÿ‘‰GVS is a novel approach that enables collision-free camera-guided video generation for predefined trajectories, it's a non-autoregressive alternative to video length extrapolation. Full repo under MIT๐Ÿ’™

๐Ÿ‘‰Review https://t.ly/TiN_5
๐Ÿ‘‰Paper https://arxiv.org/pdf/2510.24718
๐Ÿ‘‰Project https://andrewsonga.github.io/gvs/
๐Ÿ‘‰Repo github.com/andrewsonga/generative_view_stitching
๐Ÿ”ฅ10โค3๐Ÿ‘1
Greetings from the SMART CITY WORLD CONGRESS in Barcellona. If you are around, ping me ;)
๐Ÿคฃ41โค3๐Ÿ‘3๐Ÿคฉ1
This media is not supported in your browser
VIEW IN TELEGRAM
๐Ÿ”ชTracking Object Transformations๐Ÿ”ช

๐Ÿ‘‰"Track Any State": tracking objects through transformations while detecting/describing state changes. Repo & Dataset available under MIT๐Ÿ’™

๐Ÿ‘‰Review https://t.ly/NPyW4
๐Ÿ‘‰Paper https://lnkd.in/d4pA3bXJ
๐Ÿ‘‰Project https://lnkd.in/dgbNfCuj
๐Ÿ‘‰Repo https://lnkd.in/dtVWq2z7
๐Ÿ”ฅ20โค7๐Ÿคฏ3๐Ÿ‘2๐Ÿ‘1
๐Ÿ”ฅ๐Ÿ”ฅ Sunday mood ๐Ÿ”ฅ๐Ÿ”ฅ
๐Ÿคฃ35โค2