This media is not supported in your browser
VIEW IN TELEGRAM
๐Pixel-Perfect Depth (SOTA)๐
๐Pixel-Perfect Depth is a mono-depth estimation model with pixel-space diffusion transformers. New SOTA. Repo under Apache 2.0๐
๐Review https://t.ly/75PGo
๐Paper https://lnkd.in/d8wxFpyY
๐Project https://lnkd.in/dV5HhsqH
๐Repo https://lnkd.in/d9JKFBJq
๐Demo https://lnkd.in/d3wBkKJ9
๐Pixel-Perfect Depth is a mono-depth estimation model with pixel-space diffusion transformers. New SOTA. Repo under Apache 2.0๐
๐Review https://t.ly/75PGo
๐Paper https://lnkd.in/d8wxFpyY
๐Project https://lnkd.in/dV5HhsqH
๐Repo https://lnkd.in/d9JKFBJq
๐Demo https://lnkd.in/d3wBkKJ9
๐ฅ17๐คฏ5โค4
This media is not supported in your browser
VIEW IN TELEGRAM
โ๏ธ TrackVLA++ Visual Trackingโ๏ธ
๐TrackVLA++ is a novel Vision-Language-Action model that incorporates spatial reasoning and target identification memory, enabling SOTA performance in both long-horizon and highly crowded tracking scenarios. Model announced๐
๐Review https://t.ly/ruYzc
๐Paper https://arxiv.org/pdf/2510.07134
๐Project pku-epic.github.io/TrackVLA-plus-plus-Web/
๐Repo TBA
๐TrackVLA++ is a novel Vision-Language-Action model that incorporates spatial reasoning and target identification memory, enabling SOTA performance in both long-horizon and highly crowded tracking scenarios. Model announced๐
๐Review https://t.ly/ruYzc
๐Paper https://arxiv.org/pdf/2510.07134
๐Project pku-epic.github.io/TrackVLA-plus-plus-Web/
๐Repo TBA
๐ฅ6โค1๐คฃ1
This media is not supported in your browser
VIEW IN TELEGRAM
๐ซง Detect Anything via MLLM ๐ซง
๐Rex-Omni is a 3B-multimodal model that unifies visual perception tasks, including object detection, OCR, pointing, key-pointing & visual prompting into a single next point prediction framework. Impressive results. Repo under IDEA License 1.0๐
๐Review https://t.ly/DCTk_
๐Paper https://lnkd.in/d4VDD-9j
๐Project https://lnkd.in/d6unEyvq
๐Repo https://lnkd.in/dkYJFe-x
๐Rex-Omni is a 3B-multimodal model that unifies visual perception tasks, including object detection, OCR, pointing, key-pointing & visual prompting into a single next point prediction framework. Impressive results. Repo under IDEA License 1.0๐
๐Review https://t.ly/DCTk_
๐Paper https://lnkd.in/d4VDD-9j
๐Project https://lnkd.in/d6unEyvq
๐Repo https://lnkd.in/dkYJFe-x
1๐ฅ19โค11๐2๐1
This media is not supported in your browser
VIEW IN TELEGRAM
๐ซUniversal Feature Up-Sampling๐ซ
๐AnyUp is a novel method for feature up-sampling that can be applied to ANY vision feature at ANY resolution, without encoder-specific training: inference-time feature-agnostic up-sampling architecture to improve up-sampling quality. Repo CC-4.0๐
๐Review https://t.ly/HvEw9
๐Paper https://arxiv.org/pdf/2510.12764
๐Project https://wimmerth.github.io/anyup/
๐Repo https://github.com/wimmerth/anyup
๐AnyUp is a novel method for feature up-sampling that can be applied to ANY vision feature at ANY resolution, without encoder-specific training: inference-time feature-agnostic up-sampling architecture to improve up-sampling quality. Repo CC-4.0๐
๐Review https://t.ly/HvEw9
๐Paper https://arxiv.org/pdf/2510.12764
๐Project https://wimmerth.github.io/anyup/
๐Repo https://github.com/wimmerth/anyup
โค16๐ฅ7๐1
This media is not supported in your browser
VIEW IN TELEGRAM
๐ฆ City-Tour -> Simulation ๐ฆ
๐UrbanVerse is a novel system to convert real-world urban scenes from city-tour videos into physics-aware, interactive simulation environments, enabling scalable robot learning in urban spaces with real-world generalization. Repo & Data announced ๐
๐Review https://t.ly/UvXNS
๐Paper https://arxiv.org/pdf/2510.15018
๐Project https://urbanverseproject.github.io/
๐Repo TBA
๐UrbanVerse is a novel system to convert real-world urban scenes from city-tour videos into physics-aware, interactive simulation environments, enabling scalable robot learning in urban spaces with real-world generalization. Repo & Data announced ๐
๐Review https://t.ly/UvXNS
๐Paper https://arxiv.org/pdf/2510.15018
๐Project https://urbanverseproject.github.io/
๐Repo TBA
โค12๐คฉ2๐1๐ฅ1๐ข1
๐ตAll-in-One Dense Keypoints๐ต
๐DeepDetect is a novel all-in-one, dense keypoints detector that unifies the strengths of SIFT, ORB, BRISK, FAST, AGAST, Harris, Shi-Tomasi, Canny & Sobel into a neural net. DAMN ROMANTIC. Repo under MIT๐
๐Review https://t.ly/VKGct
๐Paper https://arxiv.org/pdf/2510.17422
๐Repo https://github.com/saktx/DeepDetect
๐DeepDetect is a novel all-in-one, dense keypoints detector that unifies the strengths of SIFT, ORB, BRISK, FAST, AGAST, Harris, Shi-Tomasi, Canny & Sobel into a neural net. DAMN ROMANTIC. Repo under MIT๐
๐Review https://t.ly/VKGct
๐Paper https://arxiv.org/pdf/2510.17422
๐Repo https://github.com/saktx/DeepDetect
โค15๐ฅ3๐2
This media is not supported in your browser
VIEW IN TELEGRAM
๐ฅ SAM 2++: Track Anything ๐ฅ
๐SAM 2++ is a novel unified model towards tracking at any granularity, including masks, boxes, and points. Impressive results but no code announced๐ข
๐Review https://t.ly/I392_
๐Paper arxiv.org/pdf/2510.18822
๐Project tracking-any-granularity.github.io/
๐Repo :(
๐SAM 2++ is a novel unified model towards tracking at any granularity, including masks, boxes, and points. Impressive results but no code announced๐ข
๐Review https://t.ly/I392_
๐Paper arxiv.org/pdf/2510.18822
๐Project tracking-any-granularity.github.io/
๐Repo :(
โค12๐ฅ7๐3
AI with Papers - Artificial Intelligence & Deep Learning
๐ฆ City-Tour -> Simulation ๐ฆ ๐UrbanVerse is a novel system to convert real-world urban scenes from city-tour videos into physics-aware, interactive simulation environments, enabling scalable robot learning in urban spaces with real-world generalization. Repoโฆ
Repo (pretty empty) now online: https://github.com/OatmealLiu/UrbanVerse
GitHub
GitHub - OatmealLiu/UrbanVerse: Scaling Urban Simulation - Infinite Physically-Plausible Urban Simulation = IsaacSim(Physicallyโฆ
Scaling Urban Simulation - Infinite Physically-Plausible Urban Simulation = IsaacSim(Physically-Accurate Assets ร Real-World City-Tour Layouts) - OatmealLiu/UrbanVerse
โค4
This media is not supported in your browser
VIEW IN TELEGRAM
๐๏ธOmni Driving Models๐๏ธ
๐OmniNWM is a unified panoramic navigation world model that advances autonomous driving by jointly generating multi-modal states (RGB, semantics, depth, 3D occupancy), enabling precise action control & facilitating closed-loop evaluation through occupancy-based dense rewards. Repo under Apache 2.0๐
๐Review https://t.ly/ktXvz
๐Paper https://lnkd.in/eFKSZnrc
๐Project https://lnkd.in/eSDfccv8
๐Repo https://lnkd.in/efCSvjtp
๐OmniNWM is a unified panoramic navigation world model that advances autonomous driving by jointly generating multi-modal states (RGB, semantics, depth, 3D occupancy), enabling precise action control & facilitating closed-loop evaluation through occupancy-based dense rewards. Repo under Apache 2.0๐
๐Review https://t.ly/ktXvz
๐Paper https://lnkd.in/eFKSZnrc
๐Project https://lnkd.in/eSDfccv8
๐Repo https://lnkd.in/efCSvjtp
๐ฅ6โค1๐1๐คฉ1
This media is not supported in your browser
VIEW IN TELEGRAM
๐ ITTO: Protocol for Dynamic Tracking๐
๐ITTO by Caltech is a novel long-range tracking benchmark suite for evaluating and diagnosing tracking methods on complex and long-range motions. Repo under CC BY-NC 4.0๐
๐Review https://t.ly/tN84a
๐Paper https://arxiv.org/pdf/2510.19819
๐Project https://glab-caltech.github.io/ITTO/
๐Repo https://github.com/ilonadem/itto
๐ITTO by Caltech is a novel long-range tracking benchmark suite for evaluating and diagnosing tracking methods on complex and long-range motions. Repo under CC BY-NC 4.0๐
๐Review https://t.ly/tN84a
๐Paper https://arxiv.org/pdf/2510.19819
๐Project https://glab-caltech.github.io/ITTO/
๐Repo https://github.com/ilonadem/itto
โค6๐ฅ1
This media is not supported in your browser
VIEW IN TELEGRAM
๐ฆCharacter Mixing Generation๐ฆ
๐MBZUAI unveils the first ever video-gen system able to preserve character ID, behavior & original style while generating plausible interactions between characters that have never coexisted - from cartoons (We Bare Bears, Tom & Jerry) to realistic humans (Mr. Bean, Young Sheldon)
๐Review https://t.ly/tN84a
๐Paper https://lnkd.in/dhKMwukv
๐Project https://lnkd.in/dBkJs48h
๐Repo https://lnkd.in/dw_uzgAk
๐MBZUAI unveils the first ever video-gen system able to preserve character ID, behavior & original style while generating plausible interactions between characters that have never coexisted - from cartoons (We Bare Bears, Tom & Jerry) to realistic humans (Mr. Bean, Young Sheldon)
๐Review https://t.ly/tN84a
๐Paper https://lnkd.in/dhKMwukv
๐Project https://lnkd.in/dBkJs48h
๐Repo https://lnkd.in/dw_uzgAk
๐คฉ5โค1๐1๐1
This media is not supported in your browser
VIEW IN TELEGRAM
๐งทGenerative Point Tracking w/ FM๐งท
๐Generative Point Tracker (GenPT) is a novel generative framework for modelling multi-modal trajectories. Able to capture the multi-modality in point trajectories. Repo under MIT๐
๐Review https://t.ly/MMFrt
๐Paper https://arxiv.org/pdf/2510.20951
๐Project mtesfaldet.net/genpt_projpage/
๐Repo https://github.com/tesfaldet/genpt
๐Generative Point Tracker (GenPT) is a novel generative framework for modelling multi-modal trajectories. Able to capture the multi-modality in point trajectories. Repo under MIT๐
๐Review https://t.ly/MMFrt
๐Paper https://arxiv.org/pdf/2510.20951
๐Project mtesfaldet.net/genpt_projpage/
๐Repo https://github.com/tesfaldet/genpt
๐ฅ7โค1๐1๐1
This media is not supported in your browser
VIEW IN TELEGRAM
๐ฆUnified Region-Level MLLM๐ฆ
๐PixeRefers is an unified multimodal LLM framework that supports precise, region-specific understanding in both static images and dynamic videos, overcoming the holistic, scene-level bias of prior MLLMs. SOTA results. Demo, Repo & Dataset available๐
๐Review https://t.ly/WH4dQ
๐Paper arxiv.org/pdf/2510.23603
๐Project circleradon.github.io/PixelRefer
๐Repo https://github.com/alibaba-damo-academy/PixelRefer
๐PixeRefers is an unified multimodal LLM framework that supports precise, region-specific understanding in both static images and dynamic videos, overcoming the holistic, scene-level bias of prior MLLMs. SOTA results. Demo, Repo & Dataset available๐
๐Review https://t.ly/WH4dQ
๐Paper arxiv.org/pdf/2510.23603
๐Project circleradon.github.io/PixelRefer
๐Repo https://github.com/alibaba-damo-academy/PixelRefer
๐ฅ4โค2๐คฏ2๐1
This media is not supported in your browser
VIEW IN TELEGRAM
๐ฑPlanarTrack: Large Planar Tracking๐ฑ
๐PlanarTrack is a large-scale HQ and challenging benchmark for planar tracking: 1,150 sequences with 733K+ frames, including 1,000 short-term & 150 long-term videos. Repo & Dataset available๐
๐Review https://t.ly/mYNi7
๐Paper arxiv.org/pdf/2510.23368
๐Repo https://lnkd.in/edb3GMyT
๐Project https://lnkd.in/eC-hVB-U
๐Data https://lnkd.in/eew2j4tM
๐PlanarTrack is a large-scale HQ and challenging benchmark for planar tracking: 1,150 sequences with 733K+ frames, including 1,000 short-term & 150 long-term videos. Repo & Dataset available๐
๐Review https://t.ly/mYNi7
๐Paper arxiv.org/pdf/2510.23368
๐Repo https://lnkd.in/edb3GMyT
๐Project https://lnkd.in/eC-hVB-U
๐Data https://lnkd.in/eew2j4tM
๐ฅ11โค5๐2๐1
This media is not supported in your browser
VIEW IN TELEGRAM
๐ขGenerative View Stitching ๐ข
๐GVS is a novel approach that enables collision-free camera-guided video generation for predefined trajectories, it's a non-autoregressive alternative to video length extrapolation. Full repo under MIT๐
๐Review https://t.ly/TiN_5
๐Paper https://arxiv.org/pdf/2510.24718
๐Project https://andrewsonga.github.io/gvs/
๐Repo github.com/andrewsonga/generative_view_stitching
๐GVS is a novel approach that enables collision-free camera-guided video generation for predefined trajectories, it's a non-autoregressive alternative to video length extrapolation. Full repo under MIT๐
๐Review https://t.ly/TiN_5
๐Paper https://arxiv.org/pdf/2510.24718
๐Project https://andrewsonga.github.io/gvs/
๐Repo github.com/andrewsonga/generative_view_stitching
๐ฅ10โค3๐1
This media is not supported in your browser
VIEW IN TELEGRAM
๐ชTracking Object Transformations๐ช
๐"Track Any State": tracking objects through transformations while detecting/describing state changes. Repo & Dataset available under MIT๐
๐Review https://t.ly/NPyW4
๐Paper https://lnkd.in/d4pA3bXJ
๐Project https://lnkd.in/dgbNfCuj
๐Repo https://lnkd.in/dtVWq2z7
๐"Track Any State": tracking objects through transformations while detecting/describing state changes. Repo & Dataset available under MIT๐
๐Review https://t.ly/NPyW4
๐Paper https://lnkd.in/d4pA3bXJ
๐Project https://lnkd.in/dgbNfCuj
๐Repo https://lnkd.in/dtVWq2z7
๐ฅ20โค7๐คฏ3๐2๐1