AI with Papers - Artificial Intelligence & Deep Learning
15.2K subscribers
135 photos
247 videos
13 files
1.31K links
All the AI with papers. Every day fresh updates about #DeepLearning, #MachineLearning, LLMs and #ComputerVision

Curated by Alessandro Ferrari | https://www.linkedin.com/in/visionarynet/

#artificialintelligence #machinelearning #ml #AI
Download Telegram
This media is not supported in your browser
VIEW IN TELEGRAM
πŸ”Œ BodyMAP: human body & pressure πŸ”Œ

πŸ‘‰#Nvidia (+CMU) unveils BodyMAP, the new SOTA in predicting body mesh (3D pose & shape) and 3D applied pressure on the human body. Source Code released, Dataset coming πŸ’™

πŸ‘‰Review https://t.ly/8926S
πŸ‘‰Project bodymap3d.github.io/
πŸ‘‰Paper https://lnkd.in/gCxH4ev3
πŸ‘‰Code https://lnkd.in/gaifdy3q
❀8🀯4⚑1πŸ‘1πŸ”₯1
πŸ“ˆGradient Boosting Reinforcement LearningπŸ“ˆ

πŸ‘‰#Nvidia unveils GBRL, a framework that extends the advantages of Gradient Boosting Trees to the RL domain. GBRL adapts the power of Gradient Boosting Trees to the unique challenges of RL environments, including non-stationarity and absence of predefined targets. Code releasedπŸ’™

πŸ‘‰Review https://t.ly/zv9pl
πŸ‘‰Paper https://arxiv.org/pdf/2407.08250
πŸ‘‰Code https://github.com/NVlabs/gbrl
❀7🀯4πŸ‘3πŸ”₯1πŸ₯°1
This media is not supported in your browser
VIEW IN TELEGRAM
πŸ›³οΈ EVER Ellipsoid Rendering πŸ›³οΈ

πŸ‘‰UCSD & Google present EVER, a novel method for real-time differentiable emission-only volume rendering. Unlike 3DGS it does not suffer from popping artifacts and view dependent density, achieving ∼30 FPS at 720p on #NVIDIA RTX4090.

πŸ‘‰Review https://t.ly/zAfGU
πŸ‘‰Paper arxiv.org/pdf/2410.01804
πŸ‘‰Project half-potato.gitlab.io/posts/ever/
πŸ”₯13❀2πŸ‘2πŸ‘1🀯1😱1🍾1
This media is not supported in your browser
VIEW IN TELEGRAM
πŸͺžRobo-Emulation via Video ImitationπŸͺž

πŸ‘‰OKAMI (UT & #Nvidia) is a novel foundation method that generates a manipulation plan from a single RGB-D video and derives a policy for execution.

πŸ‘‰Review https://t.ly/_N29-
πŸ‘‰Paper arxiv.org/pdf/2410.11792
πŸ‘‰Project https://lnkd.in/d6bHF_-s
πŸ‘4🀯2πŸ”₯1
πŸ”₯ "Nuclear" AI vs. Hyper-Cheap Inference πŸ”₯

⭐ What do you expect in 2025 after the #Nvidia announcements at CES 2025? Free to comment :)
Anonymous Poll
24%
🀲Portabile Training Workstation
34%
βš›οΈNuclear energy for AI training
33%
πŸ–²οΈCheaper Only-inference devices
9%
πŸ’°Cloud-intensive Only-inference
πŸ‘4❀1πŸ”₯1🀯1🀩1
This media is not supported in your browser
VIEW IN TELEGRAM
πŸ§žβ€β™‚οΈOmni-RGPT: SOTA MLLM UnderstandingπŸ§žβ€β™‚οΈ

πŸ‘‰ #NVIDIA presents Omni-RGPT, MLLM for region-level comprehension for both images & videos. New SOTA on image/video-based commonsense reasoning.

πŸ‘‰Review https://t.ly/KHnQ7
πŸ‘‰Paper arxiv.org/pdf/2501.08326
πŸ‘‰Project miranheo.github.io/omni-rgpt/
πŸ‘‰Repo TBA soon
πŸ”₯10❀3🍾2⚑1πŸ‘1πŸ‘1
This media is not supported in your browser
VIEW IN TELEGRAM
🌈 #Nvidia Foundation ZS-Stereo 🌈

πŸ‘‰Nvidia unveils FoundationStereo, a foundation model for stereo depth estimation with strong zero-shot generalization. In addition, a large-scale (1M stereo pairs) synthetic training dataset featuring large diversity and high photorealism. Code, model & dataset to be releasedπŸ’™

πŸ‘‰Review https://t.ly/rfBr5
πŸ‘‰Paper arxiv.org/pdf/2501.09898
πŸ‘‰Project nvlabs.github.io/FoundationStereo/
πŸ‘‰Repo github.com/NVlabs/FoundationStereo/tree/master
❀6πŸ”₯6🀩1
This media is not supported in your browser
VIEW IN TELEGRAM
πŸ₯›HAMSTER: Hierarchical VLA ManipulationπŸ₯›

πŸ‘‰#Nvidia unveils HAMSTER: novel Hierarchical VLA architecture to enable robotic manipulation with semantic, visual & geometric generalization trained on easy to collect, off-domain data. Source Code announcedπŸ’™

πŸ‘‰Review https://t.ly/2yXaY
πŸ‘‰Paper https://arxiv.org/pdf/2502.05485
πŸ‘‰Project https://hamster-robot.github.io/
πŸ‘‰Repo TBA
πŸ”₯4❀1
This media is not supported in your browser
VIEW IN TELEGRAM
🌈Unified Low-Level 4D Vision🌈

πŸ‘‰#Nvidia L4P is a novel feedforward, general-purpose, architecture to solve low-level 4D perception tasks in a unified framework. L4P combines a ViTbased backbone with per-task heads that are lightweight and therefore do not require extensive training. One backbone - many SOTAs. Code announced πŸ’™

πŸ‘‰Review https://t.ly/04DGj
πŸ‘‰Paper arxiv.org/pdf/2502.13078
πŸ‘‰Project research.nvidia.com/labs/lpr/l4p/
πŸ‘‰Repo TBA
πŸ”₯5πŸ‘2🀯1🀩1
This media is not supported in your browser
VIEW IN TELEGRAM
πŸ‘½Neural-Free Sparse Voxels RasterizationπŸ‘½

πŸ‘‰#Nvidia unveils a novel efficient radiance field rendering algorithm that incorporates a rasterization process on adaptive sparse voxels without neural networks or 3D Gaussians. Code released (custom license)πŸ’™

πŸ‘‰Review https://t.ly/Nh_ic
πŸ‘‰Paper https://lnkd.in/g8k8Zs6R
πŸ‘‰Project https://lnkd.in/gR-bD4Wx
πŸ‘‰Repo https://lnkd.in/gNHX-w4t
πŸ”₯15πŸ‘4🀩1
This media is not supported in your browser
VIEW IN TELEGRAM
πŸ™€3D MultiModal MemoryπŸ™€

πŸ‘‰M3 is a novel framework by UCSD & #NVIDIA for rendering 3D scenes w/ RGB & foundation model embeddings. Rich spatial & semantic understanding via novel memory system designed to retain multimodal info through videos

πŸ‘‰Review https://t.ly/OrXZO
πŸ‘‰Paper arxiv.org/pdf/2503.16413
πŸ‘‰Project https://lnkd.in/dXAZ97KH
πŸ‘‰Repo https://lnkd.in/dWvunCET
πŸ”₯10❀4πŸ‘1πŸ‘1
🦎 Scaling Vision to 4K🦎

πŸ‘‰PS3 by #Nvidia (+UC Berkeley) to scale-up CLIP-style vision pre-training to 4K with *near-constant* cost. Encoding LR global image and selectively processes only informative HR regions. Impressive work. Code/weights & πŸ€— announcedπŸ’™

πŸ‘‰Review https://t.ly/WN479
πŸ‘‰Paper https://lnkd.in/ddWq8UpX
πŸ‘‰Project https://lnkd.in/dMkTY8-k
πŸ‘‰Repo https://lnkd.in/d9YSB6yv
πŸ”₯14❀4πŸ‘2πŸ‘1
This media is not supported in your browser
VIEW IN TELEGRAM
🍏PartField #3D Part Segmentation🍏

πŸ‘‰#Nvidia unveils PartField, a FFW approach for learning part-based 3D features, which captures the general concept of parts and their hierarchy. Suitable for single-shape decomposition, co-segm., correspondence & more. Code & Models released under Nvidia LicenseπŸ’™

πŸ‘‰Review https://t.ly/fGb2O
πŸ‘‰Paper https://lnkd.in/dGeyKSzG
πŸ‘‰Code https://lnkd.in/dbe57XGH
πŸ‘‰Project https://lnkd.in/dhEgf7X2
❀2πŸ”₯2🀯2
This media is not supported in your browser
VIEW IN TELEGRAM
🦧 #Nvidia Describe Anything 🦧

πŸ‘‰Nvidia unveils Describe Anything Model (DAM) the new SOTA in generating detailed descriptions for user-specified regions in images/videos, marked by points, boxes, scribbles, or masks. Repo under Apache, Dataset available and live demo on πŸ€—

πŸ‘‰Review https://t.ly/la4JD
πŸ‘‰Paper https://lnkd.in/dZh82xtV
πŸ‘‰Project https://lnkd.in/dcv9V2ZF
πŸ‘‰Repo https://lnkd.in/dJB9Ehtb
πŸ€—Demo https://lnkd.in/dXDb2MWU
πŸ”₯10πŸ‘5❀1
This media is not supported in your browser
VIEW IN TELEGRAM
🍏#Nvidia Dynamic Pose 🍏

πŸ‘‰Nvidia unveils DynPose-100K, the largest dataset of dynamic Internet videos annotated with camera poses. Dataset released under Nvidia licenseπŸ’™

πŸ‘‰Review https://t.ly/wrcb0
πŸ‘‰Paper https://lnkd.in/dycGjAyy
πŸ‘‰Project https://lnkd.in/dDZ2Ej_Q
πŸ€—Data https://lnkd.in/d8yUSB7m
πŸ”₯4πŸ‘2❀1🀯1😍1
This media is not supported in your browser
VIEW IN TELEGRAM
πŸ§žβ€β™€οΈGENMO: Generalist Human Motion πŸ§žβ€β™€οΈ

πŸ‘‰#Nvidia presents GENMO, a unified Generalist Model for Human Motion that bridges motion estimation and generation in a single framework. Conditioning on videos, 2D keypoints, text, music, and 3D keyframes. No code at the momentπŸ₯²

πŸ‘‰Review https://t.ly/Q5T_Y
πŸ‘‰Paper https://lnkd.in/ds36BY49
πŸ‘‰Project https://lnkd.in/dAYHhuFU
πŸ”₯13❀3πŸ‘2😒1😍1
This media is not supported in your browser
VIEW IN TELEGRAM
🧀Diffusive Hand from Signs🧀

πŸ‘‰LIGM + #NVIDIA unveil a novel generative model of 3D hand motions from Sign Language Data. Motion characteristics such as handshapes, locations, finger, hand & arm movements. Code, Models & Data to be released πŸ’™

πŸ‘‰Review https://t.ly/HonX_
πŸ‘‰Paper https://arxiv.org/pdf/2508.15902
πŸ‘‰Project https://imagine.enpc.fr/~leore.bensabath/HandMDM/
πŸ‘‰Data drive.google.com/drive/u/1/folders/1BLsu2hAqhAJ_gnGb9TNXW7MLiSuSEzEj
πŸ‘‰Repo TBA
❀4πŸ”₯3πŸ‘2🀯1
This media is not supported in your browser
VIEW IN TELEGRAM
πŸ›‘οΈ3D Prompted Vision-LLMπŸ›‘οΈ

πŸ‘‰#Nvidia unveils SR-3D, a novel aware vision-language model that connects single-view 2D images and multi-view 3D data through a shared visual token space. Flexible region prompting, allowing users to annotate regions with bounding boxes, segmentation masks on any frame, or directly in 3D, without the need for exhaustive multi-frame labeling. Code & Dataset announcedπŸ’™

πŸ‘‰Review https://t.ly/5Y2c5
πŸ‘‰Paper https://arxiv.org/pdf/2509.13317
πŸ‘‰Project https://www.anjiecheng.me/sr3d
πŸ‘‰Repo TBA
❀6πŸ”₯5πŸ‘1πŸ‘1
A few β€œleaks” for you from the #Nvidia presentation I’m right now in Milan. Impressive stuff.

Ps: sorry for the shitty quality of the pics β™₯️
❀19πŸ”₯4πŸ‘1πŸ‘1🀯1🀩1
This media is not supported in your browser
VIEW IN TELEGRAM
πŸ€–Real-time Interactive VideoπŸ€–

πŸ‘‰LONGLIVE by #Nvidia is a frame-level autoregressive framework for real-time & interactive long video generation. LONGLIVE accepts sequential user prompts and generates corresponding videos in real time. Repo under non-commercial licenseπŸ’™

πŸ‘‰Review https://t.ly/jJkdY
πŸ‘‰Paper arxiv.org/pdf/2509.22622
πŸ‘‰Project nvlabs.github.io/LongLive/
πŸ‘‰Repo github.com/NVlabs/LongLive
πŸ€—huggingface.co/Efficient-Large-Model/LongLive-1.3B
πŸ”₯8❀1