This media is not supported in your browser
VIEW IN TELEGRAM
πΌSOTA Textured 3D-Guided VTONπΌ
π#ALIBABA unveils 3DV-TON, a novel diffusion model for HQ and temporally consistent video. Generating animatable textured 3D meshes as explicit frame-level guidance, alleviating the issue of models over-focusing on appearance fidelity at the expanse of motion coherence. Code & benchmark to be releasedπ
πReview https://t.ly/0tjdC
πPaper https://lnkd.in/dFseYSXz
πProject https://lnkd.in/djtqzrzs
πRepo TBA
π#ALIBABA unveils 3DV-TON, a novel diffusion model for HQ and temporally consistent video. Generating animatable textured 3D meshes as explicit frame-level guidance, alleviating the issue of models over-focusing on appearance fidelity at the expanse of motion coherence. Code & benchmark to be releasedπ
πReview https://t.ly/0tjdC
πPaper https://lnkd.in/dFseYSXz
πProject https://lnkd.in/djtqzrzs
πRepo TBA
π€―9π7β€4π₯2π1
This media is not supported in your browser
VIEW IN TELEGRAM
π#Nvidia Dynamic Pose π
πNvidia unveils DynPose-100K, the largest dataset of dynamic Internet videos annotated with camera poses. Dataset released under Nvidia licenseπ
πReview https://t.ly/wrcb0
πPaper https://lnkd.in/dycGjAyy
πProject https://lnkd.in/dDZ2Ej_Q
π€Data https://lnkd.in/d8yUSB7m
πNvidia unveils DynPose-100K, the largest dataset of dynamic Internet videos annotated with camera poses. Dataset released under Nvidia licenseπ
πReview https://t.ly/wrcb0
πPaper https://lnkd.in/dycGjAyy
πProject https://lnkd.in/dDZ2Ej_Q
π€Data https://lnkd.in/d8yUSB7m
π₯4π2β€1π€―1π1
This media is not supported in your browser
VIEW IN TELEGRAM
π₯ S3MOT: SOTA 3D MOT π₯
πS3MOT: Selective-State-Space model-based MOT that efficiently infers 3D motion and object associations from 2D images through three core components. New SOTA on KITTI with 76.86 HOTA at 31 FPS! Code & Weights to be released under MIT licenseπ
πReview https://t.ly/H_JPv
πPaper https://arxiv.org/pdf/2504.18068
πRepo https://github.com/bytepioneerX/s3mot
πS3MOT: Selective-State-Space model-based MOT that efficiently infers 3D motion and object associations from 2D images through three core components. New SOTA on KITTI with 76.86 HOTA at 31 FPS! Code & Weights to be released under MIT licenseπ
πReview https://t.ly/H_JPv
πPaper https://arxiv.org/pdf/2504.18068
πRepo https://github.com/bytepioneerX/s3mot
π₯7π2π1
This media is not supported in your browser
VIEW IN TELEGRAM
π₯ Diffusion Model <-> Depth π₯
πETH & CMU on how to turn a single-image latent diffusion model (LDM) into the SOTA video depth estimator: video depth without video models. Repo released under Apache 2.0 and HF demo availableπ
πReview https://t.ly/sP9ma
πPaper arxiv.org/pdf/2411.19189
πProject rollingdepth.github.io/
πRepo github.com/prs-eth/rollingdepth
π€Demo huggingface.co/spaces/prs-eth/rollingdepthhttps://t.ly/sP9ma
πETH & CMU on how to turn a single-image latent diffusion model (LDM) into the SOTA video depth estimator: video depth without video models. Repo released under Apache 2.0 and HF demo availableπ
πReview https://t.ly/sP9ma
πPaper arxiv.org/pdf/2411.19189
πProject rollingdepth.github.io/
πRepo github.com/prs-eth/rollingdepth
π€Demo huggingface.co/spaces/prs-eth/rollingdepthhttps://t.ly/sP9ma
β€12π₯6π3π1
This media is not supported in your browser
VIEW IN TELEGRAM
π©·Dance vs. #ComputerVisionπ©·
πThe Saint-Etienne university proposed a new 3D human body pose estimation pipeline to deal with dance analysis. Project page w/ results and interactive demo releasedπ
πReview https://t.ly/JEdM3
πPaper arxiv.org/pdf/2505.07249
πProject https://lnkd.in/dD5dsMv5
πThe Saint-Etienne university proposed a new 3D human body pose estimation pipeline to deal with dance analysis. Project page w/ results and interactive demo releasedπ
πReview https://t.ly/JEdM3
πPaper arxiv.org/pdf/2505.07249
πProject https://lnkd.in/dD5dsMv5
β€9π1π₯1
This media is not supported in your browser
VIEW IN TELEGRAM
π§ββοΈGENMO: Generalist Human Motion π§ββοΈ
π#Nvidia presents GENMO, a unified Generalist Model for Human Motion that bridges motion estimation and generation in a single framework. Conditioning on videos, 2D keypoints, text, music, and 3D keyframes. No code at the momentπ₯²
πReview https://t.ly/Q5T_Y
πPaper https://lnkd.in/ds36BY49
πProject https://lnkd.in/dAYHhuFU
π#Nvidia presents GENMO, a unified Generalist Model for Human Motion that bridges motion estimation and generation in a single framework. Conditioning on videos, 2D keypoints, text, music, and 3D keyframes. No code at the momentπ₯²
πReview https://t.ly/Q5T_Y
πPaper https://lnkd.in/ds36BY49
πProject https://lnkd.in/dAYHhuFU
π₯13β€3π2π’1π1
Dear friends,
Iβm truly sorry for being away from the group for so long. I know: no updates so far while AI is running faster than speed of light.
Iβm going through a very difficult time in my life and I need some space to heal. This spare-time project (but important for a lot of people here) needs energy and commitment I donβt have right now. Iβm sorry, be patient. Iβll be back.
Love u all,
Alessandro.
Iβm truly sorry for being away from the group for so long. I know: no updates so far while AI is running faster than speed of light.
Iβm going through a very difficult time in my life and I need some space to heal. This spare-time project (but important for a lot of people here) needs energy and commitment I donβt have right now. Iβm sorry, be patient. Iβll be back.
Love u all,
Alessandro.
β€397π28π’27
Hi everybody,
I took a few weeks to take a breath from a lot of stuff, I dedicated all my mental energy to keep working and I dedicated all my spare time to take care of myself. Despite I'm still not ok (BTW, my health was/is always good), I feel it's time to come back and support this wonderful community in this journey. I feel the responsibility of that, time to get in the ring.
I'm very sorry for being out so long, but sometime life hits really hard. I got an incredible support from unknown people from all around the world. It's amazing.
Thanks again, you rock!
Alessandro.
I took a few weeks to take a breath from a lot of stuff, I dedicated all my mental energy to keep working and I dedicated all my spare time to take care of myself. Despite I'm still not ok (BTW, my health was/is always good), I feel it's time to come back and support this wonderful community in this journey. I feel the responsibility of that, time to get in the ring.
I'm very sorry for being out so long, but sometime life hits really hard. I got an incredible support from unknown people from all around the world. It's amazing.
Thanks again, you rock!
Alessandro.
1β€195π16π₯14π5πΎ3π’2π©1
This media is not supported in your browser
VIEW IN TELEGRAM
π¦ DINOv3 is out π¦
π#Meta unveils DINOv3! A novel foundation model outperforming the previous SOTAs in computer vision. Code & weights released under DINOv3 Licenseπ
πReview https://t.ly/-S3ZL
πPaper https://t.ly/ervOT
πProject https://lnkd.in/dHFf3esd
πRepo https://lnkd.in/dPxhDxAq
π€HF https://lnkd.in/dWGudY2i
π#Meta unveils DINOv3! A novel foundation model outperforming the previous SOTAs in computer vision. Code & weights released under DINOv3 Licenseπ
πReview https://t.ly/-S3ZL
πPaper https://t.ly/ervOT
πProject https://lnkd.in/dHFf3esd
πRepo https://lnkd.in/dPxhDxAq
π€HF https://lnkd.in/dWGudY2i
β€40π₯13π2π1πΎ1
This media is not supported in your browser
VIEW IN TELEGRAM
π€ Impact of SuperHuman AI π€
πThe NoProfit AI Futures Project unveils a (dystopic) scenario about what super-AI might look like. Forecast from today to the bio-engineered human-like creatures. A fascinating speculation of the future with the "slow-down" and "race" scenarios. Enjoy π
πReview https://t.ly/EgmfJ
πProject https://ai-2027.com/
πThe NoProfit AI Futures Project unveils a (dystopic) scenario about what super-AI might look like. Forecast from today to the bio-engineered human-like creatures. A fascinating speculation of the future with the "slow-down" and "race" scenarios. Enjoy π
πReview https://t.ly/EgmfJ
πProject https://ai-2027.com/
β€7π€―2π₯1π€£1
This media is not supported in your browser
VIEW IN TELEGRAM
πTOTNet: Occlusion-aware Trackingπ
πTOTNet: novel Temporal Occlusion Tracking Network that leverages 3D-convs, visibility-weighted loss, & occlusion augmentation to improve performance under occlusions. Code & Data under MITπ
πReview https://t.ly/Q0jAf
πPaper https://lnkd.in/dUYsa-GC
πRepo https://lnkd.in/d3QGUHYb
πTOTNet: novel Temporal Occlusion Tracking Network that leverages 3D-convs, visibility-weighted loss, & occlusion augmentation to improve performance under occlusions. Code & Data under MITπ
πReview https://t.ly/Q0jAf
πPaper https://lnkd.in/dUYsa-GC
πRepo https://lnkd.in/d3QGUHYb
π₯10β€5π1π1
This media is not supported in your browser
VIEW IN TELEGRAM
πFeed-Forward 4D videoπ
π4DNeX is the first feed-forward framework for generating 4D scene representations from a single image by fine-tuning diffusion model. HQ dynamic pt-clouds & downstream tasks such as novel-view video synthesis with strong generalizability. Code/Data announced π
πReview https://t.ly/SpkD-
πPaper arxiv.org/pdf/2508.13154
πProject https://4dnex.github.io/
πRepo github.com/3DTopia/4DNeX
πData https://lnkd.in/dh4_3Ghf
πDemo https://lnkd.in/dztyzwgg
π4DNeX is the first feed-forward framework for generating 4D scene representations from a single image by fine-tuning diffusion model. HQ dynamic pt-clouds & downstream tasks such as novel-view video synthesis with strong generalizability. Code/Data announced π
πReview https://t.ly/SpkD-
πPaper arxiv.org/pdf/2508.13154
πProject https://4dnex.github.io/
πRepo github.com/3DTopia/4DNeX
πData https://lnkd.in/dh4_3Ghf
πDemo https://lnkd.in/dztyzwgg
β€9π₯7π1
This media is not supported in your browser
VIEW IN TELEGRAM
πDAViD: Synthetic Depth-Normal-Segmentationπ
π#Microsoft's DAViD: 100% synthetic dataset/models for human Depth, Normals & Segmentation. Dataset available, models & runtime under MITπ
πReview https://t.ly/-SlO_
πPaper https://lnkd.in/eCmMXpTg
πProject https://lnkd.in/eurCSWkm
πRepo https://lnkd.in/e7PWFgP2
π#Microsoft's DAViD: 100% synthetic dataset/models for human Depth, Normals & Segmentation. Dataset available, models & runtime under MITπ
πReview https://t.ly/-SlO_
πPaper https://lnkd.in/eCmMXpTg
πProject https://lnkd.in/eurCSWkm
πRepo https://lnkd.in/e7PWFgP2
π7β€4π₯2π€©1
This media is not supported in your browser
VIEW IN TELEGRAM
π OmniTry: Virtual Try-On Anything π
πOmniTry: unified framework that extends VTON beyond garment to encompass any wearable objects (jewelries, accessories, etc.) in mask-free setting. Weights, HF demo & benchmark releasedπ
πReview https://t.ly/wMBGQ
πPaper https://lnkd.in/dQe9MchS
πProject https://omnitry.github.io/
πRepo https://lnkd.in/d3QwAXY2
π€Demo https://lnkd.in/duUcZpVA
πOmniTry: unified framework that extends VTON beyond garment to encompass any wearable objects (jewelries, accessories, etc.) in mask-free setting. Weights, HF demo & benchmark releasedπ
πReview https://t.ly/wMBGQ
πPaper https://lnkd.in/dQe9MchS
πProject https://omnitry.github.io/
πRepo https://lnkd.in/d3QwAXY2
π€Demo https://lnkd.in/duUcZpVA
π₯15β€4π’1π€©1
This media is not supported in your browser
VIEW IN TELEGRAM
π‘ ROVR Open Dataset is out π‘
πA novel large-scale open 3D dataset for autonomous driving, robotics, and 4D perception tasks. To be released for academic (for free) & commercialπ
πReview https://t.ly/iDcvg
πPaper https://arxiv.org/pdf/2508.13977
πProject https://xiandaguo.net/ROVR-Open-Dataset
πA novel large-scale open 3D dataset for autonomous driving, robotics, and 4D perception tasks. To be released for academic (for free) & commercialπ
πReview https://t.ly/iDcvg
πPaper https://arxiv.org/pdf/2508.13977
πProject https://xiandaguo.net/ROVR-Open-Dataset
β€12π₯4π1
This media is not supported in your browser
VIEW IN TELEGRAM
π§ YOPO: SOTA 9-DoF Poseπ§
πPit In Co. unveils YOPO, a novel single-stage, query-based framework that treats category-level 9-DoF estimation as a natural extension of 2D detection. A practical solution for mono-RGB, category-level, multi-obj pose estimation. Code & models announced (coming)π
πReview https://t.ly/cf_Cl
πPaper https://arxiv.org/pdf/2508.14965
πProject mikigom.github.io/YOPO-project-page/
πRepo TBA
πPit In Co. unveils YOPO, a novel single-stage, query-based framework that treats category-level 9-DoF estimation as a natural extension of 2D detection. A practical solution for mono-RGB, category-level, multi-obj pose estimation. Code & models announced (coming)π
πReview https://t.ly/cf_Cl
πPaper https://arxiv.org/pdf/2508.14965
πProject mikigom.github.io/YOPO-project-page/
πRepo TBA
β€7π₯1π€©1
π¬Intern-S1: SOTA MM-MoE π¬
πInternS1: a MM-MoE with 28B activated / 241b total parameters, continually pre-trained on 5T tokens, including 2.5T+ tokens from scientific domains. New SOTA for professional tasks, such as molecular synthesis planning, reaction condition prediction, etc. Models available under Apache 2.0π
πReview https://t.ly/3l5UW
πPaper arxiv.org/pdf/2508.15763
πRepo github.com/InternLM/Intern-S1
π€HF huggingface.co/internlm/Intern-S1
πInternS1: a MM-MoE with 28B activated / 241b total parameters, continually pre-trained on 5T tokens, including 2.5T+ tokens from scientific domains. New SOTA for professional tasks, such as molecular synthesis planning, reaction condition prediction, etc. Models available under Apache 2.0π
πReview https://t.ly/3l5UW
πPaper arxiv.org/pdf/2508.15763
πRepo github.com/InternLM/Intern-S1
π€HF huggingface.co/internlm/Intern-S1
β€6π₯1
This media is not supported in your browser
VIEW IN TELEGRAM
π«ATLAS: SOTA Human Modelπ«
π#META presents ATLAS, a novel high-fidelity body model learned from 600k high-res. scans captured using 240 synchronized cams. Code announced, to be releasedπ
πReview https://t.ly/0hHud
πPaper arxiv.org/pdf/2508.15767
πProject jindapark.github.io/projects/atlas/
πRepo TBA
π#META presents ATLAS, a novel high-fidelity body model learned from 600k high-res. scans captured using 240 synchronized cams. Code announced, to be releasedπ
πReview https://t.ly/0hHud
πPaper arxiv.org/pdf/2508.15767
πProject jindapark.github.io/projects/atlas/
πRepo TBA
β€7π₯7π1π1
This media is not supported in your browser
VIEW IN TELEGRAM
π§€Diffusive Hand from Signsπ§€
πLIGM + #NVIDIA unveil a novel generative model of 3D hand motions from Sign Language Data. Motion characteristics such as handshapes, locations, finger, hand & arm movements. Code, Models & Data to be released π
πReview https://t.ly/HonX_
πPaper https://arxiv.org/pdf/2508.15902
πProject https://imagine.enpc.fr/~leore.bensabath/HandMDM/
πData drive.google.com/drive/u/1/folders/1BLsu2hAqhAJ_gnGb9TNXW7MLiSuSEzEj
πRepo TBA
πLIGM + #NVIDIA unveil a novel generative model of 3D hand motions from Sign Language Data. Motion characteristics such as handshapes, locations, finger, hand & arm movements. Code, Models & Data to be released π
πReview https://t.ly/HonX_
πPaper https://arxiv.org/pdf/2508.15902
πProject https://imagine.enpc.fr/~leore.bensabath/HandMDM/
πData drive.google.com/drive/u/1/folders/1BLsu2hAqhAJ_gnGb9TNXW7MLiSuSEzEj
πRepo TBA
β€3π₯3π2π€―1