This media is not supported in your browser
VIEW IN TELEGRAM
π₯ SAM 2++: Track Anything π₯
πSAM 2++ is a novel unified model towards tracking at any granularity, including masks, boxes, and points. Impressive results but no code announcedπ’
πReview https://t.ly/I392_
πPaper arxiv.org/pdf/2510.18822
πProject tracking-any-granularity.github.io/
πRepo :(
πSAM 2++ is a novel unified model towards tracking at any granularity, including masks, boxes, and points. Impressive results but no code announcedπ’
πReview https://t.ly/I392_
πPaper arxiv.org/pdf/2510.18822
πProject tracking-any-granularity.github.io/
πRepo :(
β€12π₯6π2
AI with Papers - Artificial Intelligence & Deep Learning
π¦ City-Tour -> Simulation π¦ πUrbanVerse is a novel system to convert real-world urban scenes from city-tour videos into physics-aware, interactive simulation environments, enabling scalable robot learning in urban spaces with real-world generalization. Repoβ¦
Repo (pretty empty) now online: https://github.com/OatmealLiu/UrbanVerse
GitHub
GitHub - OatmealLiu/UrbanVerse: Scaling Urban Simulation - Infinite Physically-Plausible Urban Simulation = IsaacSim(Physicallyβ¦
Scaling Urban Simulation - Infinite Physically-Plausible Urban Simulation = IsaacSim(Physically-Accurate Assets Γ Real-World City-Tour Layouts) - OatmealLiu/UrbanVerse
β€4
This media is not supported in your browser
VIEW IN TELEGRAM
ποΈOmni Driving ModelsποΈ
πOmniNWM is a unified panoramic navigation world model that advances autonomous driving by jointly generating multi-modal states (RGB, semantics, depth, 3D occupancy), enabling precise action control & facilitating closed-loop evaluation through occupancy-based dense rewards. Repo under Apache 2.0π
πReview https://t.ly/ktXvz
πPaper https://lnkd.in/eFKSZnrc
πProject https://lnkd.in/eSDfccv8
πRepo https://lnkd.in/efCSvjtp
πOmniNWM is a unified panoramic navigation world model that advances autonomous driving by jointly generating multi-modal states (RGB, semantics, depth, 3D occupancy), enabling precise action control & facilitating closed-loop evaluation through occupancy-based dense rewards. Repo under Apache 2.0π
πReview https://t.ly/ktXvz
πPaper https://lnkd.in/eFKSZnrc
πProject https://lnkd.in/eSDfccv8
πRepo https://lnkd.in/efCSvjtp
π₯6β€1π1π€©1
This media is not supported in your browser
VIEW IN TELEGRAM
π ITTO: Protocol for Dynamic Trackingπ
πITTO by Caltech is a novel long-range tracking benchmark suite for evaluating and diagnosing tracking methods on complex and long-range motions. Repo under CC BY-NC 4.0π
πReview https://t.ly/tN84a
πPaper https://arxiv.org/pdf/2510.19819
πProject https://glab-caltech.github.io/ITTO/
πRepo https://github.com/ilonadem/itto
πITTO by Caltech is a novel long-range tracking benchmark suite for evaluating and diagnosing tracking methods on complex and long-range motions. Repo under CC BY-NC 4.0π
πReview https://t.ly/tN84a
πPaper https://arxiv.org/pdf/2510.19819
πProject https://glab-caltech.github.io/ITTO/
πRepo https://github.com/ilonadem/itto
β€6π₯1
This media is not supported in your browser
VIEW IN TELEGRAM
π¦Character Mixing Generationπ¦
πMBZUAI unveils the first ever video-gen system able to preserve character ID, behavior & original style while generating plausible interactions between characters that have never coexisted - from cartoons (We Bare Bears, Tom & Jerry) to realistic humans (Mr. Bean, Young Sheldon)
πReview https://t.ly/tN84a
πPaper https://lnkd.in/dhKMwukv
πProject https://lnkd.in/dBkJs48h
πRepo https://lnkd.in/dw_uzgAk
πMBZUAI unveils the first ever video-gen system able to preserve character ID, behavior & original style while generating plausible interactions between characters that have never coexisted - from cartoons (We Bare Bears, Tom & Jerry) to realistic humans (Mr. Bean, Young Sheldon)
πReview https://t.ly/tN84a
πPaper https://lnkd.in/dhKMwukv
πProject https://lnkd.in/dBkJs48h
πRepo https://lnkd.in/dw_uzgAk
π€©4β€1π1π1
This media is not supported in your browser
VIEW IN TELEGRAM
π§·Generative Point Tracking w/ FMπ§·
πGenerative Point Tracker (GenPT) is a novel generative framework for modelling multi-modal trajectories. Able to capture the multi-modality in point trajectories. Repo under MITπ
πReview https://t.ly/MMFrt
πPaper https://arxiv.org/pdf/2510.20951
πProject mtesfaldet.net/genpt_projpage/
πRepo https://github.com/tesfaldet/genpt
πGenerative Point Tracker (GenPT) is a novel generative framework for modelling multi-modal trajectories. Able to capture the multi-modality in point trajectories. Repo under MITπ
πReview https://t.ly/MMFrt
πPaper https://arxiv.org/pdf/2510.20951
πProject mtesfaldet.net/genpt_projpage/
πRepo https://github.com/tesfaldet/genpt
π₯7β€1π1π1
This media is not supported in your browser
VIEW IN TELEGRAM
π¦Unified Region-Level MLLMπ¦
πPixeRefers is an unified multimodal LLM framework that supports precise, region-specific understanding in both static images and dynamic videos, overcoming the holistic, scene-level bias of prior MLLMs. SOTA results. Demo, Repo & Dataset availableπ
πReview https://t.ly/WH4dQ
πPaper arxiv.org/pdf/2510.23603
πProject circleradon.github.io/PixelRefer
πRepo https://github.com/alibaba-damo-academy/PixelRefer
πPixeRefers is an unified multimodal LLM framework that supports precise, region-specific understanding in both static images and dynamic videos, overcoming the holistic, scene-level bias of prior MLLMs. SOTA results. Demo, Repo & Dataset availableπ
πReview https://t.ly/WH4dQ
πPaper arxiv.org/pdf/2510.23603
πProject circleradon.github.io/PixelRefer
πRepo https://github.com/alibaba-damo-academy/PixelRefer
π₯3β€2π€―2π1
This media is not supported in your browser
VIEW IN TELEGRAM
π±PlanarTrack: Large Planar Trackingπ±
πPlanarTrack is a large-scale HQ and challenging benchmark for planar tracking: 1,150 sequences with 733K+ frames, including 1,000 short-term & 150 long-term videos. Repo & Dataset availableπ
πReview https://t.ly/mYNi7
πPaper arxiv.org/pdf/2510.23368
πRepo https://lnkd.in/edb3GMyT
πProject https://lnkd.in/eC-hVB-U
πData https://lnkd.in/eew2j4tM
πPlanarTrack is a large-scale HQ and challenging benchmark for planar tracking: 1,150 sequences with 733K+ frames, including 1,000 short-term & 150 long-term videos. Repo & Dataset availableπ
πReview https://t.ly/mYNi7
πPaper arxiv.org/pdf/2510.23368
πRepo https://lnkd.in/edb3GMyT
πProject https://lnkd.in/eC-hVB-U
πData https://lnkd.in/eew2j4tM
π₯10β€5π2π1
This media is not supported in your browser
VIEW IN TELEGRAM
π’Generative View Stitching π’
πGVS is a novel approach that enables collision-free camera-guided video generation for predefined trajectories, it's a non-autoregressive alternative to video length extrapolation. Full repo under MITπ
πReview https://t.ly/TiN_5
πPaper https://arxiv.org/pdf/2510.24718
πProject https://andrewsonga.github.io/gvs/
πRepo github.com/andrewsonga/generative_view_stitching
πGVS is a novel approach that enables collision-free camera-guided video generation for predefined trajectories, it's a non-autoregressive alternative to video length extrapolation. Full repo under MITπ
πReview https://t.ly/TiN_5
πPaper https://arxiv.org/pdf/2510.24718
πProject https://andrewsonga.github.io/gvs/
πRepo github.com/andrewsonga/generative_view_stitching
π₯9β€3π1
This media is not supported in your browser
VIEW IN TELEGRAM
πͺTracking Object Transformationsπͺ
π"Track Any State": tracking objects through transformations while detecting/describing state changes. Repo & Dataset available under MITπ
πReview https://t.ly/NPyW4
πPaper https://lnkd.in/d4pA3bXJ
πProject https://lnkd.in/dgbNfCuj
πRepo https://lnkd.in/dtVWq2z7
π"Track Any State": tracking objects through transformations while detecting/describing state changes. Repo & Dataset available under MITπ
πReview https://t.ly/NPyW4
πPaper https://lnkd.in/d4pA3bXJ
πProject https://lnkd.in/dgbNfCuj
πRepo https://lnkd.in/dtVWq2z7
π₯17β€7π€―3π2π1
πΈAnother BRIXEL in the Wall πΈ
πBRIXEL allows the user to produce high-resolution feature maps using the DINOv3 backbone without requiring large amounts of compute. Repo releasedπ
πReview https://t.ly/fZPwC
πPaper arxiv.org/pdf/2511.05168
πRepo github.com/alexanderlappe/BRIXEL
πBRIXEL allows the user to produce high-resolution feature maps using the DINOv3 backbone without requiring large amounts of compute. Repo releasedπ
πReview https://t.ly/fZPwC
πPaper arxiv.org/pdf/2511.05168
πRepo github.com/alexanderlappe/BRIXEL
π€©7π€―3π₯2β€1π1π1
This media is not supported in your browser
VIEW IN TELEGRAM
πΌPixel-Dense EmbeddingπΌ
πFlowFeat is a novel high-resolution and multi-task feature representation that embeds a distribution of plausible apparent motions, or motion profiles. Repo available under π
πReview https://t.ly/aUx_U
πPaper arxiv.org/pdf/2511.07696
πProject tum-vision.github.io/flowfeat
πRepo github.com/tum-vision/flowfeat
πFlowFeat is a novel high-resolution and multi-task feature representation that embeds a distribution of plausible apparent motions, or motion profiles. Repo available under π
πReview https://t.ly/aUx_U
πPaper arxiv.org/pdf/2511.07696
πProject tum-vision.github.io/flowfeat
πRepo github.com/tum-vision/flowfeat
π₯4π3β€2
π¨ Announcement π¨
Iβve received numerous reports of people blatantly copying my content on LinkedIn just to get a few likes.
Let me be very clear: I put a great deal of time and effort into reviewing papers and creating original, meaningful content. Itβs disappointing to see professionals (some of whom are even members of this group or my connections) resorting to plagiarism instead of contributing their own ideas.
π Starting today, Iβll be removing these connections from LinkedIn and banning such individuals from this group.
π’ I also encourage everyone to report these cases whenever you come across them. Every single report helps stop this bad habit and keeps our community fair, respectful, and authentic.
Iβve received numerous reports of people blatantly copying my content on LinkedIn just to get a few likes.
Let me be very clear: I put a great deal of time and effort into reviewing papers and creating original, meaningful content. Itβs disappointing to see professionals (some of whom are even members of this group or my connections) resorting to plagiarism instead of contributing their own ideas.
π Starting today, Iβll be removing these connections from LinkedIn and banning such individuals from this group.
π’ I also encourage everyone to report these cases whenever you come across them. Every single report helps stop this bad habit and keeps our community fair, respectful, and authentic.
β€29π18π11