AI with Papers - Artificial Intelligence & Deep Learning
15.2K subscribers
135 photos
247 videos
14 files
1.31K links
All the AI with papers. Every day fresh updates about #DeepLearning, #MachineLearning, LLMs and #ComputerVision

Curated by Alessandro Ferrari | https://www.linkedin.com/in/visionarynet/

#artificialintelligence #machinelearning #ml #AI
Download Telegram
This media is not supported in your browser
VIEW IN TELEGRAM
๐Ÿ“Moving Points -> Depth๐Ÿ“

๐Ÿ‘‰KAIST & Adobe propose Seurat, a novel method that infers relative depth by examining the spatial relationships and temporal evolution of a set of tracked 2D trajectories (via off-the-shelf point tracking models). Repo & Demo to be released๐Ÿ’™

๐Ÿ‘‰Review https://t.ly/qA2P5
๐Ÿ‘‰Paper https://lnkd.in/dpXDaQtM
๐Ÿ‘‰Project https://lnkd.in/d9qWYsjP
๐Ÿ‘‰Repo https://lnkd.in/dZEMDiJh
โค9๐Ÿ”ฅ3๐Ÿ‘1๐Ÿ‘1
This media is not supported in your browser
VIEW IN TELEGRAM
๐ŸŒผSOTA Textured 3D-Guided VTON๐ŸŒผ

๐Ÿ‘‰#ALIBABA unveils 3DV-TON, a novel diffusion model for HQ and temporally consistent video. Generating animatable textured 3D meshes as explicit frame-level guidance, alleviating the issue of models over-focusing on appearance fidelity at the expanse of motion coherence. Code & benchmark to be released๐Ÿ’™

๐Ÿ‘‰Review https://t.ly/0tjdC
๐Ÿ‘‰Paper https://lnkd.in/dFseYSXz
๐Ÿ‘‰Project https://lnkd.in/djtqzrzs
๐Ÿ‘‰Repo TBA
๐Ÿคฏ9๐Ÿ‘7โค4๐Ÿ”ฅ2๐Ÿ‘1
This media is not supported in your browser
VIEW IN TELEGRAM
๐Ÿ#Nvidia Dynamic Pose ๐Ÿ

๐Ÿ‘‰Nvidia unveils DynPose-100K, the largest dataset of dynamic Internet videos annotated with camera poses. Dataset released under Nvidia license๐Ÿ’™

๐Ÿ‘‰Review https://t.ly/wrcb0
๐Ÿ‘‰Paper https://lnkd.in/dycGjAyy
๐Ÿ‘‰Project https://lnkd.in/dDZ2Ej_Q
๐Ÿค—Data https://lnkd.in/d8yUSB7m
๐Ÿ”ฅ4๐Ÿ‘2โค1๐Ÿคฏ1๐Ÿ˜1
This media is not supported in your browser
VIEW IN TELEGRAM
๐Ÿ”ฅ S3MOT: SOTA 3D MOT ๐Ÿ”ฅ

๐Ÿ‘‰S3MOT: Selective-State-Space model-based MOT that efficiently infers 3D motion and object associations from 2D images through three core components. New SOTA on KITTI with 76.86 HOTA at 31 FPS! Code & Weights to be released under MIT license๐Ÿ’™

๐Ÿ‘‰Review https://t.ly/H_JPv
๐Ÿ‘‰Paper https://arxiv.org/pdf/2504.18068
๐Ÿ‘‰Repo https://github.com/bytepioneerX/s3mot
๐Ÿ”ฅ7๐Ÿ˜2๐Ÿ‘1
This media is not supported in your browser
VIEW IN TELEGRAM
๐Ÿ”ฅ Diffusion Model <-> Depth ๐Ÿ”ฅ

๐Ÿ‘‰ETH & CMU on how to turn a single-image latent diffusion model (LDM) into the SOTA video depth estimator: video depth without video models. Repo released under Apache 2.0 and HF demo available๐Ÿ’™

๐Ÿ‘‰Review https://t.ly/sP9ma
๐Ÿ‘‰Paper arxiv.org/pdf/2411.19189
๐Ÿ‘‰Project rollingdepth.github.io/
๐Ÿ‘‰Repo github.com/prs-eth/rollingdepth
๐Ÿค—Demo huggingface.co/spaces/prs-eth/rollingdepthhttps://t.ly/sP9ma
โค12๐Ÿ”ฅ6๐Ÿ‘3๐Ÿ‘1
This media is not supported in your browser
VIEW IN TELEGRAM
๐ŸฉทDance vs. #ComputerVision๐Ÿฉท

๐Ÿ‘‰The Saint-Etienne university proposed a new 3D human body pose estimation pipeline to deal with dance analysis. Project page w/ results and interactive demo released๐Ÿ’™

๐Ÿ‘‰Review https://t.ly/JEdM3
๐Ÿ‘‰Paper arxiv.org/pdf/2505.07249
๐Ÿ‘‰Project https://lnkd.in/dD5dsMv5
โค9๐Ÿ‘1๐Ÿ”ฅ1
This media is not supported in your browser
VIEW IN TELEGRAM
๐Ÿงžโ€โ™€๏ธGENMO: Generalist Human Motion ๐Ÿงžโ€โ™€๏ธ

๐Ÿ‘‰#Nvidia presents GENMO, a unified Generalist Model for Human Motion that bridges motion estimation and generation in a single framework. Conditioning on videos, 2D keypoints, text, music, and 3D keyframes. No code at the moment๐Ÿฅฒ

๐Ÿ‘‰Review https://t.ly/Q5T_Y
๐Ÿ‘‰Paper https://lnkd.in/ds36BY49
๐Ÿ‘‰Project https://lnkd.in/dAYHhuFU
๐Ÿ”ฅ13โค3๐Ÿ‘2๐Ÿ˜ข1๐Ÿ˜1
Dear friends,
Iโ€™m truly sorry for being away from the group for so long. I know: no updates so far while AI is running faster than speed of light.

Iโ€™m going through a very difficult time in my life and I need some space to heal. This spare-time project (but important for a lot of people here) needs energy and commitment I donโ€™t have right now. Iโ€™m sorry, be patient. Iโ€™ll be back.

Love u all,
Alessandro.
โค399๐Ÿ‘28๐Ÿ˜ข27
Hi everybody,
I took a few weeks to take a breath from a lot of stuff, I dedicated all my mental energy to keep working and I dedicated all my spare time to take care of myself. Despite I'm still not ok (BTW, my health was/is always good), I feel it's time to come back and support this wonderful community in this journey. I feel the responsibility of that, time to get in the ring.

I'm very sorry for being out so long, but sometime life hits really hard. I got an incredible support from unknown people from all around the world. It's amazing.

Thanks again, you rock!
Alessandro.
1โค198๐Ÿ‘16๐Ÿ”ฅ15๐Ÿ‘5๐Ÿพ3๐Ÿ˜ข2๐Ÿ’ฉ1
This media is not supported in your browser
VIEW IN TELEGRAM
๐Ÿฆ– DINOv3 is out ๐Ÿฆ–

๐Ÿ‘‰#Meta unveils DINOv3! A novel foundation model outperforming the previous SOTAs in computer vision. Code & weights released under DINOv3 License๐Ÿ’™

๐Ÿ‘‰Review https://t.ly/-S3ZL
๐Ÿ‘‰Paper https://t.ly/ervOT
๐Ÿ‘‰Project https://lnkd.in/dHFf3esd
๐Ÿ‘‰Repo https://lnkd.in/dPxhDxAq
๐Ÿค—HF https://lnkd.in/dWGudY2i
โค42๐Ÿ”ฅ13๐Ÿ‘2๐Ÿ˜1๐Ÿพ1
This media is not supported in your browser
VIEW IN TELEGRAM
๐Ÿค– Impact of SuperHuman AI ๐Ÿค–

๐Ÿ‘‰The NoProfit AI Futures Project unveils a (dystopic) scenario about what super-AI might look like. Forecast from today to the bio-engineered human-like creatures. A fascinating speculation of the future with the "slow-down" and "race" scenarios. Enjoy ๐Ÿ’™

๐Ÿ‘‰Review https://t.ly/EgmfJ
๐Ÿ‘‰Project https://ai-2027.com/
โค7๐Ÿ”ฅ2๐Ÿคฏ2๐Ÿคฃ1
This media is not supported in your browser
VIEW IN TELEGRAM
๐Ÿ“TOTNet: Occlusion-aware Tracking๐Ÿ“

๐Ÿ‘‰TOTNet: novel Temporal Occlusion Tracking Network that leverages 3D-convs, visibility-weighted loss, & occlusion augmentation to improve performance under occlusions. Code & Data under MIT๐Ÿ’™

๐Ÿ‘‰Review https://t.ly/Q0jAf
๐Ÿ‘‰Paper https://lnkd.in/dUYsa-GC
๐Ÿ‘‰Repo https://lnkd.in/d3QGUHYb
๐Ÿ”ฅ10โค6๐Ÿ‘1๐Ÿ˜1
This media is not supported in your browser
VIEW IN TELEGRAM
๐Ÿ”€Feed-Forward 4D video๐Ÿ”€

๐Ÿ‘‰4DNeX is the first feed-forward framework for generating 4D scene representations from a single image by fine-tuning diffusion model. HQ dynamic pt-clouds & downstream tasks such as novel-view video synthesis with strong generalizability. Code/Data announced ๐Ÿ’™

๐Ÿ‘‰Review https://t.ly/SpkD-
๐Ÿ‘‰Paper arxiv.org/pdf/2508.13154
๐Ÿ‘‰Project https://4dnex.github.io/
๐Ÿ‘‰Repo github.com/3DTopia/4DNeX
๐Ÿ‘‰Data https://lnkd.in/dh4_3Ghf
๐Ÿ‘‰Demo https://lnkd.in/dztyzwgg
โค10๐Ÿ”ฅ7๐Ÿ‘1
This media is not supported in your browser
VIEW IN TELEGRAM
๐ŸŒˆDAViD: Synthetic Depth-Normal-Segmentation๐ŸŒˆ

๐Ÿ‘‰#Microsoft's DAViD: 100% synthetic dataset/models for human Depth, Normals & Segmentation. Dataset available, models & runtime under MIT๐Ÿ’™

๐Ÿ‘‰Review https://t.ly/-SlO_
๐Ÿ‘‰Paper https://lnkd.in/eCmMXpTg
๐Ÿ‘‰Project https://lnkd.in/eurCSWkm
๐Ÿ‘‰Repo https://lnkd.in/e7PWFgP2
๐Ÿ‘7โค6๐Ÿ”ฅ3๐Ÿคฉ1
This media is not supported in your browser
VIEW IN TELEGRAM
๐Ÿ‘  OmniTry: Virtual Try-On Anything ๐Ÿ‘ 

๐Ÿ‘‰OmniTry: unified framework that extends VTON beyond garment to encompass any wearable objects (jewelries, accessories, etc.) in mask-free setting. Weights, HF demo & benchmark released๐Ÿ’™

๐Ÿ‘‰Review https://t.ly/wMBGQ
๐Ÿ‘‰Paper https://lnkd.in/dQe9MchS
๐Ÿ‘‰Project https://omnitry.github.io/
๐Ÿ‘‰Repo https://lnkd.in/d3QwAXY2
๐Ÿค—Demo https://lnkd.in/duUcZpVA
๐Ÿ”ฅ15โค5๐Ÿ˜ข1๐Ÿคฉ1
This media is not supported in your browser
VIEW IN TELEGRAM
๐Ÿ“ก ROVR Open Dataset is out ๐Ÿ“ก

๐Ÿ‘‰A novel large-scale open 3D dataset for autonomous driving, robotics, and 4D perception tasks. To be released for academic (for free) & commercial๐Ÿ’™

๐Ÿ‘‰Review https://t.ly/iDcvg
๐Ÿ‘‰Paper https://arxiv.org/pdf/2508.13977
๐Ÿ‘‰Project https://xiandaguo.net/ROVR-Open-Dataset
โค12๐Ÿ”ฅ4๐Ÿ‘1
This media is not supported in your browser
VIEW IN TELEGRAM
๐Ÿง‰ YOPO: SOTA 9-DoF Pose๐Ÿง‰

๐Ÿ‘‰Pit In Co. unveils YOPO, a novel single-stage, query-based framework that treats category-level 9-DoF estimation as a natural extension of 2D detection. A practical solution for mono-RGB, category-level, multi-obj pose estimation. Code & models announced (coming)๐Ÿ’™

๐Ÿ‘‰Review https://t.ly/cf_Cl
๐Ÿ‘‰Paper https://arxiv.org/pdf/2508.14965
๐Ÿ‘‰Project mikigom.github.io/YOPO-project-page/
๐Ÿ‘‰Repo TBA
โค8๐Ÿ”ฅ1๐Ÿคฉ1
๐Ÿ”ฌIntern-S1: SOTA MM-MoE ๐Ÿ”ฌ

๐Ÿ‘‰InternS1: a MM-MoE with 28B activated / 241b total parameters, continually pre-trained on 5T tokens, including 2.5T+ tokens from scientific domains. New SOTA for professional tasks, such as molecular synthesis planning, reaction condition prediction, etc. Models available under Apache 2.0๐Ÿ’™

๐Ÿ‘‰Review https://t.ly/3l5UW
๐Ÿ‘‰Paper arxiv.org/pdf/2508.15763
๐Ÿ‘‰Repo github.com/InternLM/Intern-S1
๐Ÿค—HF huggingface.co/internlm/Intern-S1
โค6๐Ÿ”ฅ1
This media is not supported in your browser
VIEW IN TELEGRAM
๐Ÿซ”ATLAS: SOTA Human Model๐Ÿซ”

๐Ÿ‘‰#META presents ATLAS, a novel high-fidelity body model learned from 600k high-res. scans captured using 240 synchronized cams. Code announced, to be released๐Ÿ’™

๐Ÿ‘‰Review https://t.ly/0hHud
๐Ÿ‘‰Paper arxiv.org/pdf/2508.15767
๐Ÿ‘‰Project jindapark.github.io/projects/atlas/
๐Ÿ‘‰Repo TBA
๐Ÿ”ฅ7โค6๐Ÿ‘1๐Ÿ˜1
This media is not supported in your browser
VIEW IN TELEGRAM
๐ŸงคDiffusive Hand from Signs๐Ÿงค

๐Ÿ‘‰LIGM + #NVIDIA unveil a novel generative model of 3D hand motions from Sign Language Data. Motion characteristics such as handshapes, locations, finger, hand & arm movements. Code, Models & Data to be released ๐Ÿ’™

๐Ÿ‘‰Review https://t.ly/HonX_
๐Ÿ‘‰Paper https://arxiv.org/pdf/2508.15902
๐Ÿ‘‰Project https://imagine.enpc.fr/~leore.bensabath/HandMDM/
๐Ÿ‘‰Data drive.google.com/drive/u/1/folders/1BLsu2hAqhAJ_gnGb9TNXW7MLiSuSEzEj
๐Ÿ‘‰Repo TBA
โค4๐Ÿ”ฅ3๐Ÿ‘2๐Ÿคฏ1