This media is not supported in your browser
VIEW IN TELEGRAM
π BodyMAP: human body & pressure π
π#Nvidia (+CMU) unveils BodyMAP, the new SOTA in predicting body mesh (3D pose & shape) and 3D applied pressure on the human body. Source Code released, Dataset coming π
πReview https://t.ly/8926S
πProject bodymap3d.github.io/
πPaper https://lnkd.in/gCxH4ev3
πCode https://lnkd.in/gaifdy3q
π#Nvidia (+CMU) unveils BodyMAP, the new SOTA in predicting body mesh (3D pose & shape) and 3D applied pressure on the human body. Source Code released, Dataset coming π
πReview https://t.ly/8926S
πProject bodymap3d.github.io/
πPaper https://lnkd.in/gCxH4ev3
πCode https://lnkd.in/gaifdy3q
β€8π€―4β‘1π1π₯1
πGradient Boosting Reinforcement Learningπ
π#Nvidia unveils GBRL, a framework that extends the advantages of Gradient Boosting Trees to the RL domain. GBRL adapts the power of Gradient Boosting Trees to the unique challenges of RL environments, including non-stationarity and absence of predefined targets. Code releasedπ
πReview https://t.ly/zv9pl
πPaper https://arxiv.org/pdf/2407.08250
πCode https://github.com/NVlabs/gbrl
π#Nvidia unveils GBRL, a framework that extends the advantages of Gradient Boosting Trees to the RL domain. GBRL adapts the power of Gradient Boosting Trees to the unique challenges of RL environments, including non-stationarity and absence of predefined targets. Code releasedπ
πReview https://t.ly/zv9pl
πPaper https://arxiv.org/pdf/2407.08250
πCode https://github.com/NVlabs/gbrl
β€7π€―4π3π₯1π₯°1
This media is not supported in your browser
VIEW IN TELEGRAM
π³οΈ EVER Ellipsoid Rendering π³οΈ
πUCSD & Google present EVER, a novel method for real-time differentiable emission-only volume rendering. Unlike 3DGS it does not suffer from popping artifacts and view dependent density, achieving βΌ30 FPS at 720p on #NVIDIA RTX4090.
πReview https://t.ly/zAfGU
πPaper arxiv.org/pdf/2410.01804
πProject half-potato.gitlab.io/posts/ever/
πUCSD & Google present EVER, a novel method for real-time differentiable emission-only volume rendering. Unlike 3DGS it does not suffer from popping artifacts and view dependent density, achieving βΌ30 FPS at 720p on #NVIDIA RTX4090.
πReview https://t.ly/zAfGU
πPaper arxiv.org/pdf/2410.01804
πProject half-potato.gitlab.io/posts/ever/
π₯13β€2π2π1π€―1π±1πΎ1
This media is not supported in your browser
VIEW IN TELEGRAM
πͺRobo-Emulation via Video Imitationπͺ
πOKAMI (UT & #Nvidia) is a novel foundation method that generates a manipulation plan from a single RGB-D video and derives a policy for execution.
πReview https://t.ly/_N29-
πPaper arxiv.org/pdf/2410.11792
πProject https://lnkd.in/d6bHF_-s
πOKAMI (UT & #Nvidia) is a novel foundation method that generates a manipulation plan from a single RGB-D video and derives a policy for execution.
πReview https://t.ly/_N29-
πPaper arxiv.org/pdf/2410.11792
πProject https://lnkd.in/d6bHF_-s
π4π€―2π₯1
π₯ "Nuclear" AI vs. Hyper-Cheap Inference π₯
β What do you expect in 2025 after the #Nvidia announcements at CES 2025? Free to comment :)
β What do you expect in 2025 after the #Nvidia announcements at CES 2025? Free to comment :)
Anonymous Poll
24%
π€²Portabile Training Workstation
34%
βοΈNuclear energy for AI training
33%
π²οΈCheaper Only-inference devices
9%
π°Cloud-intensive Only-inference
π4β€1π₯1π€―1π€©1
This media is not supported in your browser
VIEW IN TELEGRAM
π§ββοΈOmni-RGPT: SOTA MLLM Understandingπ§ββοΈ
π #NVIDIA presents Omni-RGPT, MLLM for region-level comprehension for both images & videos. New SOTA on image/video-based commonsense reasoning.
πReview https://t.ly/KHnQ7
πPaper arxiv.org/pdf/2501.08326
πProject miranheo.github.io/omni-rgpt/
πRepo TBA soon
π #NVIDIA presents Omni-RGPT, MLLM for region-level comprehension for both images & videos. New SOTA on image/video-based commonsense reasoning.
πReview https://t.ly/KHnQ7
πPaper arxiv.org/pdf/2501.08326
πProject miranheo.github.io/omni-rgpt/
πRepo TBA soon
π₯10β€3πΎ2β‘1π1π1
This media is not supported in your browser
VIEW IN TELEGRAM
π #Nvidia Foundation ZS-Stereo π
πNvidia unveils FoundationStereo, a foundation model for stereo depth estimation with strong zero-shot generalization. In addition, a large-scale (1M stereo pairs) synthetic training dataset featuring large diversity and high photorealism. Code, model & dataset to be releasedπ
πReview https://t.ly/rfBr5
πPaper arxiv.org/pdf/2501.09898
πProject nvlabs.github.io/FoundationStereo/
πRepo github.com/NVlabs/FoundationStereo/tree/master
πNvidia unveils FoundationStereo, a foundation model for stereo depth estimation with strong zero-shot generalization. In addition, a large-scale (1M stereo pairs) synthetic training dataset featuring large diversity and high photorealism. Code, model & dataset to be releasedπ
πReview https://t.ly/rfBr5
πPaper arxiv.org/pdf/2501.09898
πProject nvlabs.github.io/FoundationStereo/
πRepo github.com/NVlabs/FoundationStereo/tree/master
β€6π₯6π€©1
This media is not supported in your browser
VIEW IN TELEGRAM
π₯HAMSTER: Hierarchical VLA Manipulationπ₯
π#Nvidia unveils HAMSTER: novel Hierarchical VLA architecture to enable robotic manipulation with semantic, visual & geometric generalization trained on easy to collect, off-domain data. Source Code announcedπ
πReview https://t.ly/2yXaY
πPaper https://arxiv.org/pdf/2502.05485
πProject https://hamster-robot.github.io/
πRepo TBA
π#Nvidia unveils HAMSTER: novel Hierarchical VLA architecture to enable robotic manipulation with semantic, visual & geometric generalization trained on easy to collect, off-domain data. Source Code announcedπ
πReview https://t.ly/2yXaY
πPaper https://arxiv.org/pdf/2502.05485
πProject https://hamster-robot.github.io/
πRepo TBA
π₯4β€1
This media is not supported in your browser
VIEW IN TELEGRAM
πUnified Low-Level 4D Visionπ
π#Nvidia L4P is a novel feedforward, general-purpose, architecture to solve low-level 4D perception tasks in a unified framework. L4P combines a ViTbased backbone with per-task heads that are lightweight and therefore do not require extensive training. One backbone - many SOTAs. Code announced π
πReview https://t.ly/04DGj
πPaper arxiv.org/pdf/2502.13078
πProject research.nvidia.com/labs/lpr/l4p/
πRepo TBA
π#Nvidia L4P is a novel feedforward, general-purpose, architecture to solve low-level 4D perception tasks in a unified framework. L4P combines a ViTbased backbone with per-task heads that are lightweight and therefore do not require extensive training. One backbone - many SOTAs. Code announced π
πReview https://t.ly/04DGj
πPaper arxiv.org/pdf/2502.13078
πProject research.nvidia.com/labs/lpr/l4p/
πRepo TBA
π₯5π2π€―1π€©1
This media is not supported in your browser
VIEW IN TELEGRAM
π½Neural-Free Sparse Voxels Rasterizationπ½
π#Nvidia unveils a novel efficient radiance field rendering algorithm that incorporates a rasterization process on adaptive sparse voxels without neural networks or 3D Gaussians. Code released (custom license)π
πReview https://t.ly/Nh_ic
πPaper https://lnkd.in/g8k8Zs6R
πProject https://lnkd.in/gR-bD4Wx
πRepo https://lnkd.in/gNHX-w4t
π#Nvidia unveils a novel efficient radiance field rendering algorithm that incorporates a rasterization process on adaptive sparse voxels without neural networks or 3D Gaussians. Code released (custom license)π
πReview https://t.ly/Nh_ic
πPaper https://lnkd.in/g8k8Zs6R
πProject https://lnkd.in/gR-bD4Wx
πRepo https://lnkd.in/gNHX-w4t
π₯15π4π€©1
This media is not supported in your browser
VIEW IN TELEGRAM
π3D MultiModal Memoryπ
πM3 is a novel framework by UCSD & #NVIDIA for rendering 3D scenes w/ RGB & foundation model embeddings. Rich spatial & semantic understanding via novel memory system designed to retain multimodal info through videos
πReview https://t.ly/OrXZO
πPaper arxiv.org/pdf/2503.16413
πProject https://lnkd.in/dXAZ97KH
πRepo https://lnkd.in/dWvunCET
πM3 is a novel framework by UCSD & #NVIDIA for rendering 3D scenes w/ RGB & foundation model embeddings. Rich spatial & semantic understanding via novel memory system designed to retain multimodal info through videos
πReview https://t.ly/OrXZO
πPaper arxiv.org/pdf/2503.16413
πProject https://lnkd.in/dXAZ97KH
πRepo https://lnkd.in/dWvunCET
π₯10β€4π1π1
π¦ Scaling Vision to 4Kπ¦
πPS3 by #Nvidia (+UC Berkeley) to scale-up CLIP-style vision pre-training to 4K with *near-constant* cost. Encoding LR global image and selectively processes only informative HR regions. Impressive work. Code/weights & π€ announcedπ
πReview https://t.ly/WN479
πPaper https://lnkd.in/ddWq8UpX
πProject https://lnkd.in/dMkTY8-k
πRepo https://lnkd.in/d9YSB6yv
πPS3 by #Nvidia (+UC Berkeley) to scale-up CLIP-style vision pre-training to 4K with *near-constant* cost. Encoding LR global image and selectively processes only informative HR regions. Impressive work. Code/weights & π€ announcedπ
πReview https://t.ly/WN479
πPaper https://lnkd.in/ddWq8UpX
πProject https://lnkd.in/dMkTY8-k
πRepo https://lnkd.in/d9YSB6yv
π₯14β€4π2π1
This media is not supported in your browser
VIEW IN TELEGRAM
πPartField #3D Part Segmentationπ
π#Nvidia unveils PartField, a FFW approach for learning part-based 3D features, which captures the general concept of parts and their hierarchy. Suitable for single-shape decomposition, co-segm., correspondence & more. Code & Models released under Nvidia Licenseπ
πReview https://t.ly/fGb2O
πPaper https://lnkd.in/dGeyKSzG
πCode https://lnkd.in/dbe57XGH
πProject https://lnkd.in/dhEgf7X2
π#Nvidia unveils PartField, a FFW approach for learning part-based 3D features, which captures the general concept of parts and their hierarchy. Suitable for single-shape decomposition, co-segm., correspondence & more. Code & Models released under Nvidia Licenseπ
πReview https://t.ly/fGb2O
πPaper https://lnkd.in/dGeyKSzG
πCode https://lnkd.in/dbe57XGH
πProject https://lnkd.in/dhEgf7X2
β€2π₯2π€―2
This media is not supported in your browser
VIEW IN TELEGRAM
𦧠#Nvidia Describe Anything π¦§
πNvidia unveils Describe Anything Model (DAM) the new SOTA in generating detailed descriptions for user-specified regions in images/videos, marked by points, boxes, scribbles, or masks. Repo under Apache, Dataset available and live demo on π€
πReview https://t.ly/la4JD
πPaper https://lnkd.in/dZh82xtV
πProject https://lnkd.in/dcv9V2ZF
πRepo https://lnkd.in/dJB9Ehtb
π€Demo https://lnkd.in/dXDb2MWU
πNvidia unveils Describe Anything Model (DAM) the new SOTA in generating detailed descriptions for user-specified regions in images/videos, marked by points, boxes, scribbles, or masks. Repo under Apache, Dataset available and live demo on π€
πReview https://t.ly/la4JD
πPaper https://lnkd.in/dZh82xtV
πProject https://lnkd.in/dcv9V2ZF
πRepo https://lnkd.in/dJB9Ehtb
π€Demo https://lnkd.in/dXDb2MWU
π₯10π5β€1
This media is not supported in your browser
VIEW IN TELEGRAM
π#Nvidia Dynamic Pose π
πNvidia unveils DynPose-100K, the largest dataset of dynamic Internet videos annotated with camera poses. Dataset released under Nvidia licenseπ
πReview https://t.ly/wrcb0
πPaper https://lnkd.in/dycGjAyy
πProject https://lnkd.in/dDZ2Ej_Q
π€Data https://lnkd.in/d8yUSB7m
πNvidia unveils DynPose-100K, the largest dataset of dynamic Internet videos annotated with camera poses. Dataset released under Nvidia licenseπ
πReview https://t.ly/wrcb0
πPaper https://lnkd.in/dycGjAyy
πProject https://lnkd.in/dDZ2Ej_Q
π€Data https://lnkd.in/d8yUSB7m
π₯4π2β€1π€―1π1
This media is not supported in your browser
VIEW IN TELEGRAM
π§ββοΈGENMO: Generalist Human Motion π§ββοΈ
π#Nvidia presents GENMO, a unified Generalist Model for Human Motion that bridges motion estimation and generation in a single framework. Conditioning on videos, 2D keypoints, text, music, and 3D keyframes. No code at the momentπ₯²
πReview https://t.ly/Q5T_Y
πPaper https://lnkd.in/ds36BY49
πProject https://lnkd.in/dAYHhuFU
π#Nvidia presents GENMO, a unified Generalist Model for Human Motion that bridges motion estimation and generation in a single framework. Conditioning on videos, 2D keypoints, text, music, and 3D keyframes. No code at the momentπ₯²
πReview https://t.ly/Q5T_Y
πPaper https://lnkd.in/ds36BY49
πProject https://lnkd.in/dAYHhuFU
π₯13β€3π2π’1π1
This media is not supported in your browser
VIEW IN TELEGRAM
π§€Diffusive Hand from Signsπ§€
πLIGM + #NVIDIA unveil a novel generative model of 3D hand motions from Sign Language Data. Motion characteristics such as handshapes, locations, finger, hand & arm movements. Code, Models & Data to be released π
πReview https://t.ly/HonX_
πPaper https://arxiv.org/pdf/2508.15902
πProject https://imagine.enpc.fr/~leore.bensabath/HandMDM/
πData drive.google.com/drive/u/1/folders/1BLsu2hAqhAJ_gnGb9TNXW7MLiSuSEzEj
πRepo TBA
πLIGM + #NVIDIA unveil a novel generative model of 3D hand motions from Sign Language Data. Motion characteristics such as handshapes, locations, finger, hand & arm movements. Code, Models & Data to be released π
πReview https://t.ly/HonX_
πPaper https://arxiv.org/pdf/2508.15902
πProject https://imagine.enpc.fr/~leore.bensabath/HandMDM/
πData drive.google.com/drive/u/1/folders/1BLsu2hAqhAJ_gnGb9TNXW7MLiSuSEzEj
πRepo TBA
β€4π₯3π2π€―1
This media is not supported in your browser
VIEW IN TELEGRAM
π‘οΈ3D Prompted Vision-LLMπ‘οΈ
π#Nvidia unveils SR-3D, a novel aware vision-language model that connects single-view 2D images and multi-view 3D data through a shared visual token space. Flexible region prompting, allowing users to annotate regions with bounding boxes, segmentation masks on any frame, or directly in 3D, without the need for exhaustive multi-frame labeling. Code & Dataset announcedπ
πReview https://t.ly/5Y2c5
πPaper https://arxiv.org/pdf/2509.13317
πProject https://www.anjiecheng.me/sr3d
πRepo TBA
π#Nvidia unveils SR-3D, a novel aware vision-language model that connects single-view 2D images and multi-view 3D data through a shared visual token space. Flexible region prompting, allowing users to annotate regions with bounding boxes, segmentation masks on any frame, or directly in 3D, without the need for exhaustive multi-frame labeling. Code & Dataset announcedπ
πReview https://t.ly/5Y2c5
πPaper https://arxiv.org/pdf/2509.13317
πProject https://www.anjiecheng.me/sr3d
πRepo TBA
β€6π₯5π1π1
A few βleaksβ for you from the #Nvidia presentation Iβm right now in Milan. Impressive stuff.
Ps: sorry for the shitty quality of the pics β₯οΈ
Ps: sorry for the shitty quality of the pics β₯οΈ
β€19π₯4π1π1π€―1π€©1
This media is not supported in your browser
VIEW IN TELEGRAM
π€Real-time Interactive Videoπ€
πLONGLIVE by #Nvidia is a frame-level autoregressive framework for real-time & interactive long video generation. LONGLIVE accepts sequential user prompts and generates corresponding videos in real time. Repo under non-commercial licenseπ
πReview https://t.ly/jJkdY
πPaper arxiv.org/pdf/2509.22622
πProject nvlabs.github.io/LongLive/
πRepo github.com/NVlabs/LongLive
π€huggingface.co/Efficient-Large-Model/LongLive-1.3B
πLONGLIVE by #Nvidia is a frame-level autoregressive framework for real-time & interactive long video generation. LONGLIVE accepts sequential user prompts and generates corresponding videos in real time. Repo under non-commercial licenseπ
πReview https://t.ly/jJkdY
πPaper arxiv.org/pdf/2509.22622
πProject nvlabs.github.io/LongLive/
πRepo github.com/NVlabs/LongLive
π€huggingface.co/Efficient-Large-Model/LongLive-1.3B
π₯8β€1