This media is not supported in your browser
VIEW IN TELEGRAM
๐Moving Points -> Depth๐
๐KAIST & Adobe propose Seurat, a novel method that infers relative depth by examining the spatial relationships and temporal evolution of a set of tracked 2D trajectories (via off-the-shelf point tracking models). Repo & Demo to be released๐
๐Review https://t.ly/qA2P5
๐Paper https://lnkd.in/dpXDaQtM
๐Project https://lnkd.in/d9qWYsjP
๐Repo https://lnkd.in/dZEMDiJh
๐KAIST & Adobe propose Seurat, a novel method that infers relative depth by examining the spatial relationships and temporal evolution of a set of tracked 2D trajectories (via off-the-shelf point tracking models). Repo & Demo to be released๐
๐Review https://t.ly/qA2P5
๐Paper https://lnkd.in/dpXDaQtM
๐Project https://lnkd.in/d9qWYsjP
๐Repo https://lnkd.in/dZEMDiJh
โค8๐ฅ3๐1๐1
This media is not supported in your browser
VIEW IN TELEGRAM
๐ผSOTA Textured 3D-Guided VTON๐ผ
๐#ALIBABA unveils 3DV-TON, a novel diffusion model for HQ and temporally consistent video. Generating animatable textured 3D meshes as explicit frame-level guidance, alleviating the issue of models over-focusing on appearance fidelity at the expanse of motion coherence. Code & benchmark to be released๐
๐Review https://t.ly/0tjdC
๐Paper https://lnkd.in/dFseYSXz
๐Project https://lnkd.in/djtqzrzs
๐Repo TBA
๐#ALIBABA unveils 3DV-TON, a novel diffusion model for HQ and temporally consistent video. Generating animatable textured 3D meshes as explicit frame-level guidance, alleviating the issue of models over-focusing on appearance fidelity at the expanse of motion coherence. Code & benchmark to be released๐
๐Review https://t.ly/0tjdC
๐Paper https://lnkd.in/dFseYSXz
๐Project https://lnkd.in/djtqzrzs
๐Repo TBA
๐คฏ9๐7โค4๐ฅ2๐1
This media is not supported in your browser
VIEW IN TELEGRAM
๐#Nvidia Dynamic Pose ๐
๐Nvidia unveils DynPose-100K, the largest dataset of dynamic Internet videos annotated with camera poses. Dataset released under Nvidia license๐
๐Review https://t.ly/wrcb0
๐Paper https://lnkd.in/dycGjAyy
๐Project https://lnkd.in/dDZ2Ej_Q
๐คData https://lnkd.in/d8yUSB7m
๐Nvidia unveils DynPose-100K, the largest dataset of dynamic Internet videos annotated with camera poses. Dataset released under Nvidia license๐
๐Review https://t.ly/wrcb0
๐Paper https://lnkd.in/dycGjAyy
๐Project https://lnkd.in/dDZ2Ej_Q
๐คData https://lnkd.in/d8yUSB7m
๐ฅ4๐2โค1๐คฏ1๐1
This media is not supported in your browser
VIEW IN TELEGRAM
๐ฅ S3MOT: SOTA 3D MOT ๐ฅ
๐S3MOT: Selective-State-Space model-based MOT that efficiently infers 3D motion and object associations from 2D images through three core components. New SOTA on KITTI with 76.86 HOTA at 31 FPS! Code & Weights to be released under MIT license๐
๐Review https://t.ly/H_JPv
๐Paper https://arxiv.org/pdf/2504.18068
๐Repo https://github.com/bytepioneerX/s3mot
๐S3MOT: Selective-State-Space model-based MOT that efficiently infers 3D motion and object associations from 2D images through three core components. New SOTA on KITTI with 76.86 HOTA at 31 FPS! Code & Weights to be released under MIT license๐
๐Review https://t.ly/H_JPv
๐Paper https://arxiv.org/pdf/2504.18068
๐Repo https://github.com/bytepioneerX/s3mot
๐ฅ7๐2๐1
This media is not supported in your browser
VIEW IN TELEGRAM
๐ฅ Diffusion Model <-> Depth ๐ฅ
๐ETH & CMU on how to turn a single-image latent diffusion model (LDM) into the SOTA video depth estimator: video depth without video models. Repo released under Apache 2.0 and HF demo available๐
๐Review https://t.ly/sP9ma
๐Paper arxiv.org/pdf/2411.19189
๐Project rollingdepth.github.io/
๐Repo github.com/prs-eth/rollingdepth
๐คDemo huggingface.co/spaces/prs-eth/rollingdepthhttps://t.ly/sP9ma
๐ETH & CMU on how to turn a single-image latent diffusion model (LDM) into the SOTA video depth estimator: video depth without video models. Repo released under Apache 2.0 and HF demo available๐
๐Review https://t.ly/sP9ma
๐Paper arxiv.org/pdf/2411.19189
๐Project rollingdepth.github.io/
๐Repo github.com/prs-eth/rollingdepth
๐คDemo huggingface.co/spaces/prs-eth/rollingdepthhttps://t.ly/sP9ma
โค12๐ฅ6๐3๐1
This media is not supported in your browser
VIEW IN TELEGRAM
๐ฉทDance vs. #ComputerVision๐ฉท
๐The Saint-Etienne university proposed a new 3D human body pose estimation pipeline to deal with dance analysis. Project page w/ results and interactive demo released๐
๐Review https://t.ly/JEdM3
๐Paper arxiv.org/pdf/2505.07249
๐Project https://lnkd.in/dD5dsMv5
๐The Saint-Etienne university proposed a new 3D human body pose estimation pipeline to deal with dance analysis. Project page w/ results and interactive demo released๐
๐Review https://t.ly/JEdM3
๐Paper arxiv.org/pdf/2505.07249
๐Project https://lnkd.in/dD5dsMv5
โค9๐1๐ฅ1
This media is not supported in your browser
VIEW IN TELEGRAM
๐งโโ๏ธGENMO: Generalist Human Motion ๐งโโ๏ธ
๐#Nvidia presents GENMO, a unified Generalist Model for Human Motion that bridges motion estimation and generation in a single framework. Conditioning on videos, 2D keypoints, text, music, and 3D keyframes. No code at the moment๐ฅฒ
๐Review https://t.ly/Q5T_Y
๐Paper https://lnkd.in/ds36BY49
๐Project https://lnkd.in/dAYHhuFU
๐#Nvidia presents GENMO, a unified Generalist Model for Human Motion that bridges motion estimation and generation in a single framework. Conditioning on videos, 2D keypoints, text, music, and 3D keyframes. No code at the moment๐ฅฒ
๐Review https://t.ly/Q5T_Y
๐Paper https://lnkd.in/ds36BY49
๐Project https://lnkd.in/dAYHhuFU
๐ฅ13โค3๐2๐ข1๐1
Dear friends,
Iโm truly sorry for being away from the group for so long. I know: no updates so far while AI is running faster than speed of light.
Iโm going through a very difficult time in my life and I need some space to heal. This spare-time project (but important for a lot of people here) needs energy and commitment I donโt have right now. Iโm sorry, be patient. Iโll be back.
Love u all,
Alessandro.
Iโm truly sorry for being away from the group for so long. I know: no updates so far while AI is running faster than speed of light.
Iโm going through a very difficult time in my life and I need some space to heal. This spare-time project (but important for a lot of people here) needs energy and commitment I donโt have right now. Iโm sorry, be patient. Iโll be back.
Love u all,
Alessandro.
โค397๐28๐ข27
Hi everybody,
I took a few weeks to take a breath from a lot of stuff, I dedicated all my mental energy to keep working and I dedicated all my spare time to take care of myself. Despite I'm still not ok (BTW, my health was/is always good), I feel it's time to come back and support this wonderful community in this journey. I feel the responsibility of that, time to get in the ring.
I'm very sorry for being out so long, but sometime life hits really hard. I got an incredible support from unknown people from all around the world. It's amazing.
Thanks again, you rock!
Alessandro.
I took a few weeks to take a breath from a lot of stuff, I dedicated all my mental energy to keep working and I dedicated all my spare time to take care of myself. Despite I'm still not ok (BTW, my health was/is always good), I feel it's time to come back and support this wonderful community in this journey. I feel the responsibility of that, time to get in the ring.
I'm very sorry for being out so long, but sometime life hits really hard. I got an incredible support from unknown people from all around the world. It's amazing.
Thanks again, you rock!
Alessandro.
1โค195๐16๐ฅ14๐5๐พ3๐ข2๐ฉ1
This media is not supported in your browser
VIEW IN TELEGRAM
๐ฆ DINOv3 is out ๐ฆ
๐#Meta unveils DINOv3! A novel foundation model outperforming the previous SOTAs in computer vision. Code & weights released under DINOv3 License๐
๐Review https://t.ly/-S3ZL
๐Paper https://t.ly/ervOT
๐Project https://lnkd.in/dHFf3esd
๐Repo https://lnkd.in/dPxhDxAq
๐คHF https://lnkd.in/dWGudY2i
๐#Meta unveils DINOv3! A novel foundation model outperforming the previous SOTAs in computer vision. Code & weights released under DINOv3 License๐
๐Review https://t.ly/-S3ZL
๐Paper https://t.ly/ervOT
๐Project https://lnkd.in/dHFf3esd
๐Repo https://lnkd.in/dPxhDxAq
๐คHF https://lnkd.in/dWGudY2i
โค40๐ฅ13๐2๐1๐พ1
This media is not supported in your browser
VIEW IN TELEGRAM
๐ค Impact of SuperHuman AI ๐ค
๐The NoProfit AI Futures Project unveils a (dystopic) scenario about what super-AI might look like. Forecast from today to the bio-engineered human-like creatures. A fascinating speculation of the future with the "slow-down" and "race" scenarios. Enjoy ๐
๐Review https://t.ly/EgmfJ
๐Project https://ai-2027.com/
๐The NoProfit AI Futures Project unveils a (dystopic) scenario about what super-AI might look like. Forecast from today to the bio-engineered human-like creatures. A fascinating speculation of the future with the "slow-down" and "race" scenarios. Enjoy ๐
๐Review https://t.ly/EgmfJ
๐Project https://ai-2027.com/
โค7๐คฏ2๐ฅ1๐คฃ1
This media is not supported in your browser
VIEW IN TELEGRAM
๐TOTNet: Occlusion-aware Tracking๐
๐TOTNet: novel Temporal Occlusion Tracking Network that leverages 3D-convs, visibility-weighted loss, & occlusion augmentation to improve performance under occlusions. Code & Data under MIT๐
๐Review https://t.ly/Q0jAf
๐Paper https://lnkd.in/dUYsa-GC
๐Repo https://lnkd.in/d3QGUHYb
๐TOTNet: novel Temporal Occlusion Tracking Network that leverages 3D-convs, visibility-weighted loss, & occlusion augmentation to improve performance under occlusions. Code & Data under MIT๐
๐Review https://t.ly/Q0jAf
๐Paper https://lnkd.in/dUYsa-GC
๐Repo https://lnkd.in/d3QGUHYb
๐ฅ10โค5๐1๐1
This media is not supported in your browser
VIEW IN TELEGRAM
๐Feed-Forward 4D video๐
๐4DNeX is the first feed-forward framework for generating 4D scene representations from a single image by fine-tuning diffusion model. HQ dynamic pt-clouds & downstream tasks such as novel-view video synthesis with strong generalizability. Code/Data announced ๐
๐Review https://t.ly/SpkD-
๐Paper arxiv.org/pdf/2508.13154
๐Project https://4dnex.github.io/
๐Repo github.com/3DTopia/4DNeX
๐Data https://lnkd.in/dh4_3Ghf
๐Demo https://lnkd.in/dztyzwgg
๐4DNeX is the first feed-forward framework for generating 4D scene representations from a single image by fine-tuning diffusion model. HQ dynamic pt-clouds & downstream tasks such as novel-view video synthesis with strong generalizability. Code/Data announced ๐
๐Review https://t.ly/SpkD-
๐Paper arxiv.org/pdf/2508.13154
๐Project https://4dnex.github.io/
๐Repo github.com/3DTopia/4DNeX
๐Data https://lnkd.in/dh4_3Ghf
๐Demo https://lnkd.in/dztyzwgg
โค9๐ฅ7๐1
This media is not supported in your browser
VIEW IN TELEGRAM
๐DAViD: Synthetic Depth-Normal-Segmentation๐
๐#Microsoft's DAViD: 100% synthetic dataset/models for human Depth, Normals & Segmentation. Dataset available, models & runtime under MIT๐
๐Review https://t.ly/-SlO_
๐Paper https://lnkd.in/eCmMXpTg
๐Project https://lnkd.in/eurCSWkm
๐Repo https://lnkd.in/e7PWFgP2
๐#Microsoft's DAViD: 100% synthetic dataset/models for human Depth, Normals & Segmentation. Dataset available, models & runtime under MIT๐
๐Review https://t.ly/-SlO_
๐Paper https://lnkd.in/eCmMXpTg
๐Project https://lnkd.in/eurCSWkm
๐Repo https://lnkd.in/e7PWFgP2
๐7โค4๐ฅ2๐คฉ1
This media is not supported in your browser
VIEW IN TELEGRAM
๐ OmniTry: Virtual Try-On Anything ๐
๐OmniTry: unified framework that extends VTON beyond garment to encompass any wearable objects (jewelries, accessories, etc.) in mask-free setting. Weights, HF demo & benchmark released๐
๐Review https://t.ly/wMBGQ
๐Paper https://lnkd.in/dQe9MchS
๐Project https://omnitry.github.io/
๐Repo https://lnkd.in/d3QwAXY2
๐คDemo https://lnkd.in/duUcZpVA
๐OmniTry: unified framework that extends VTON beyond garment to encompass any wearable objects (jewelries, accessories, etc.) in mask-free setting. Weights, HF demo & benchmark released๐
๐Review https://t.ly/wMBGQ
๐Paper https://lnkd.in/dQe9MchS
๐Project https://omnitry.github.io/
๐Repo https://lnkd.in/d3QwAXY2
๐คDemo https://lnkd.in/duUcZpVA
๐ฅ15โค4๐ข1๐คฉ1
This media is not supported in your browser
VIEW IN TELEGRAM
๐ก ROVR Open Dataset is out ๐ก
๐A novel large-scale open 3D dataset for autonomous driving, robotics, and 4D perception tasks. To be released for academic (for free) & commercial๐
๐Review https://t.ly/iDcvg
๐Paper https://arxiv.org/pdf/2508.13977
๐Project https://xiandaguo.net/ROVR-Open-Dataset
๐A novel large-scale open 3D dataset for autonomous driving, robotics, and 4D perception tasks. To be released for academic (for free) & commercial๐
๐Review https://t.ly/iDcvg
๐Paper https://arxiv.org/pdf/2508.13977
๐Project https://xiandaguo.net/ROVR-Open-Dataset
โค12๐ฅ4๐1
This media is not supported in your browser
VIEW IN TELEGRAM
๐ง YOPO: SOTA 9-DoF Pose๐ง
๐Pit In Co. unveils YOPO, a novel single-stage, query-based framework that treats category-level 9-DoF estimation as a natural extension of 2D detection. A practical solution for mono-RGB, category-level, multi-obj pose estimation. Code & models announced (coming)๐
๐Review https://t.ly/cf_Cl
๐Paper https://arxiv.org/pdf/2508.14965
๐Project mikigom.github.io/YOPO-project-page/
๐Repo TBA
๐Pit In Co. unveils YOPO, a novel single-stage, query-based framework that treats category-level 9-DoF estimation as a natural extension of 2D detection. A practical solution for mono-RGB, category-level, multi-obj pose estimation. Code & models announced (coming)๐
๐Review https://t.ly/cf_Cl
๐Paper https://arxiv.org/pdf/2508.14965
๐Project mikigom.github.io/YOPO-project-page/
๐Repo TBA
โค7๐ฅ1๐คฉ1
๐ฌIntern-S1: SOTA MM-MoE ๐ฌ
๐InternS1: a MM-MoE with 28B activated / 241b total parameters, continually pre-trained on 5T tokens, including 2.5T+ tokens from scientific domains. New SOTA for professional tasks, such as molecular synthesis planning, reaction condition prediction, etc. Models available under Apache 2.0๐
๐Review https://t.ly/3l5UW
๐Paper arxiv.org/pdf/2508.15763
๐Repo github.com/InternLM/Intern-S1
๐คHF huggingface.co/internlm/Intern-S1
๐InternS1: a MM-MoE with 28B activated / 241b total parameters, continually pre-trained on 5T tokens, including 2.5T+ tokens from scientific domains. New SOTA for professional tasks, such as molecular synthesis planning, reaction condition prediction, etc. Models available under Apache 2.0๐
๐Review https://t.ly/3l5UW
๐Paper arxiv.org/pdf/2508.15763
๐Repo github.com/InternLM/Intern-S1
๐คHF huggingface.co/internlm/Intern-S1
โค6๐ฅ1
This media is not supported in your browser
VIEW IN TELEGRAM
๐ซATLAS: SOTA Human Model๐ซ
๐#META presents ATLAS, a novel high-fidelity body model learned from 600k high-res. scans captured using 240 synchronized cams. Code announced, to be released๐
๐Review https://t.ly/0hHud
๐Paper arxiv.org/pdf/2508.15767
๐Project jindapark.github.io/projects/atlas/
๐Repo TBA
๐#META presents ATLAS, a novel high-fidelity body model learned from 600k high-res. scans captured using 240 synchronized cams. Code announced, to be released๐
๐Review https://t.ly/0hHud
๐Paper arxiv.org/pdf/2508.15767
๐Project jindapark.github.io/projects/atlas/
๐Repo TBA
โค7๐ฅ7๐1๐1
This media is not supported in your browser
VIEW IN TELEGRAM
๐งคDiffusive Hand from Signs๐งค
๐LIGM + #NVIDIA unveil a novel generative model of 3D hand motions from Sign Language Data. Motion characteristics such as handshapes, locations, finger, hand & arm movements. Code, Models & Data to be released ๐
๐Review https://t.ly/HonX_
๐Paper https://arxiv.org/pdf/2508.15902
๐Project https://imagine.enpc.fr/~leore.bensabath/HandMDM/
๐Data drive.google.com/drive/u/1/folders/1BLsu2hAqhAJ_gnGb9TNXW7MLiSuSEzEj
๐Repo TBA
๐LIGM + #NVIDIA unveil a novel generative model of 3D hand motions from Sign Language Data. Motion characteristics such as handshapes, locations, finger, hand & arm movements. Code, Models & Data to be released ๐
๐Review https://t.ly/HonX_
๐Paper https://arxiv.org/pdf/2508.15902
๐Project https://imagine.enpc.fr/~leore.bensabath/HandMDM/
๐Data drive.google.com/drive/u/1/folders/1BLsu2hAqhAJ_gnGb9TNXW7MLiSuSEzEj
๐Repo TBA
โค3๐ฅ3๐2๐คฏ1