AI with Papers - Artificial Intelligence & Deep Learning
15.4K subscribers
139 photos
253 videos
14 files
1.33K links
All the AI with papers. Every day fresh updates about #DeepLearning, #MachineLearning, LLMs and #ComputerVision

Curated by Alessandro Ferrari | https://www.linkedin.com/in/visionarynet/

#artificialintelligence #machinelearning #ml #AI
Download Telegram
This media is not supported in your browser
VIEW IN TELEGRAM
🔥 #6D Foundation Pose 🔥

👉#Nvidia unveils FoundationPose, a novel (and unified) foundation model for 6D object pose estimation and tracking.

👉Review https://t.ly/HGd4h
👉Project https://lnkd.in/dPcnBKWm
👉Paper https://lnkd.in/dixn_iHZ
👉Code coming 🩷
🔥125👏1🤯1
This media is not supported in your browser
VIEW IN TELEGRAM
🦩 WildRGB-D: Objects in the Wild 🦩

👉#NVIDIA unveils a novel RGB-D object dataset captured in the wild: ~8500 recorded objects, ~20,000 RGBD videos, 46 categories with corresponding masks and 3D point clouds.

👉Review https://t.ly/WCqVz
👉Data github.com/wildrgbd/wildrgbd
👉Paper arxiv.org/pdf/2401.12592.pdf
👉Project wildrgbd.github.io/
👍93🔥2👏1🤩1😍1
This media is not supported in your browser
VIEW IN TELEGRAM
🌆 Up to 69x Faster SAM 🌆

👉EfficientViT-SAM is a new family of accelerated Segment Anything Models. The same old SAM’s lightweight prompt encoder and mask decoder, while replacing the heavy image encoder with EfficientViT. Up to 69x faster, source code released. Authors: Tsinghua, MIT & #Nvidia

👉Review https://t.ly/zGiE9
👉Paper arxiv.org/pdf/2402.05008.pdf
👉Code github.com/mit-han-lab/efficientvit
🔥19👍74🥰1
This media is not supported in your browser
VIEW IN TELEGRAM
🔌 BodyMAP: human body & pressure 🔌

👉#Nvidia (+CMU) unveils BodyMAP, the new SOTA in predicting body mesh (3D pose & shape) and 3D applied pressure on the human body. Source Code released, Dataset coming 💙

👉Review https://t.ly/8926S
👉Project bodymap3d.github.io/
👉Paper https://lnkd.in/gCxH4ev3
👉Code https://lnkd.in/gaifdy3q
8🤯41👍1🔥1
📈Gradient Boosting Reinforcement Learning📈

👉#Nvidia unveils GBRL, a framework that extends the advantages of Gradient Boosting Trees to the RL domain. GBRL adapts the power of Gradient Boosting Trees to the unique challenges of RL environments, including non-stationarity and absence of predefined targets. Code released💙

👉Review https://t.ly/zv9pl
👉Paper https://arxiv.org/pdf/2407.08250
👉Code https://github.com/NVlabs/gbrl
7🤯4👍3🔥1🥰1
This media is not supported in your browser
VIEW IN TELEGRAM
🛳️ EVER Ellipsoid Rendering 🛳️

👉UCSD & Google present EVER, a novel method for real-time differentiable emission-only volume rendering. Unlike 3DGS it does not suffer from popping artifacts and view dependent density, achieving ∼30 FPS at 720p on #NVIDIA RTX4090.

👉Review https://t.ly/zAfGU
👉Paper arxiv.org/pdf/2410.01804
👉Project half-potato.gitlab.io/posts/ever/
🔥132👍2👏1🤯1😱1🍾1
This media is not supported in your browser
VIEW IN TELEGRAM
🪞Robo-Emulation via Video Imitation🪞

👉OKAMI (UT & #Nvidia) is a novel foundation method that generates a manipulation plan from a single RGB-D video and derives a policy for execution.

👉Review https://t.ly/_N29-
👉Paper arxiv.org/pdf/2410.11792
👉Project https://lnkd.in/d6bHF_-s
👍4🤯2🔥1
🔥 "Nuclear" AI vs. Hyper-Cheap Inference 🔥

What do you expect in 2025 after the #Nvidia announcements at CES 2025? Free to comment :)
Anonymous Poll
24%
🤲Portabile Training Workstation
34%
⚛️Nuclear energy for AI training
33%
🖲️Cheaper Only-inference devices
9%
💰Cloud-intensive Only-inference
👍41🔥1🤯1🤩1
This media is not supported in your browser
VIEW IN TELEGRAM
🧞‍♂️Omni-RGPT: SOTA MLLM Understanding🧞‍♂️

👉 #NVIDIA presents Omni-RGPT, MLLM for region-level comprehension for both images & videos. New SOTA on image/video-based commonsense reasoning.

👉Review https://t.ly/KHnQ7
👉Paper arxiv.org/pdf/2501.08326
👉Project miranheo.github.io/omni-rgpt/
👉Repo TBA soon
🔥103🍾21👍1👏1
This media is not supported in your browser
VIEW IN TELEGRAM
🌈 #Nvidia Foundation ZS-Stereo 🌈

👉Nvidia unveils FoundationStereo, a foundation model for stereo depth estimation with strong zero-shot generalization. In addition, a large-scale (1M stereo pairs) synthetic training dataset featuring large diversity and high photorealism. Code, model & dataset to be released💙

👉Review https://t.ly/rfBr5
👉Paper arxiv.org/pdf/2501.09898
👉Project nvlabs.github.io/FoundationStereo/
👉Repo github.com/NVlabs/FoundationStereo/tree/master
6🔥6🤩1