AI with Papers - Artificial Intelligence & Deep Learning
15.1K subscribers
98 photos
245 videos
12 files
1.29K links
All the AI with papers. Every day fresh updates on Deep Learning, Machine Learning, and Computer Vision (with Papers).

Curated by Alessandro Ferrari | https://www.linkedin.com/in/visionarynet/
Download Telegram
This media is not supported in your browser
VIEW IN TELEGRAM
๐Ÿ“Moving Points -> Depth๐Ÿ“

๐Ÿ‘‰KAIST & Adobe propose Seurat, a novel method that infers relative depth by examining the spatial relationships and temporal evolution of a set of tracked 2D trajectories (via off-the-shelf point tracking models). Repo & Demo to be released๐Ÿ’™

๐Ÿ‘‰Review https://t.ly/qA2P5
๐Ÿ‘‰Paper https://lnkd.in/dpXDaQtM
๐Ÿ‘‰Project https://lnkd.in/d9qWYsjP
๐Ÿ‘‰Repo https://lnkd.in/dZEMDiJh
โค8๐Ÿ”ฅ3๐Ÿ‘1๐Ÿ‘1
This media is not supported in your browser
VIEW IN TELEGRAM
๐ŸŒผSOTA Textured 3D-Guided VTON๐ŸŒผ

๐Ÿ‘‰#ALIBABA unveils 3DV-TON, a novel diffusion model for HQ and temporally consistent video. Generating animatable textured 3D meshes as explicit frame-level guidance, alleviating the issue of models over-focusing on appearance fidelity at the expanse of motion coherence. Code & benchmark to be released๐Ÿ’™

๐Ÿ‘‰Review https://t.ly/0tjdC
๐Ÿ‘‰Paper https://lnkd.in/dFseYSXz
๐Ÿ‘‰Project https://lnkd.in/djtqzrzs
๐Ÿ‘‰Repo TBA
๐Ÿคฏ9๐Ÿ‘7โค4๐Ÿ”ฅ2๐Ÿ‘1
This media is not supported in your browser
VIEW IN TELEGRAM
๐Ÿ#Nvidia Dynamic Pose ๐Ÿ

๐Ÿ‘‰Nvidia unveils DynPose-100K, the largest dataset of dynamic Internet videos annotated with camera poses. Dataset released under Nvidia license๐Ÿ’™

๐Ÿ‘‰Review https://t.ly/wrcb0
๐Ÿ‘‰Paper https://lnkd.in/dycGjAyy
๐Ÿ‘‰Project https://lnkd.in/dDZ2Ej_Q
๐Ÿค—Data https://lnkd.in/d8yUSB7m
๐Ÿ”ฅ4๐Ÿ‘2โค1๐Ÿคฏ1๐Ÿ˜1
This media is not supported in your browser
VIEW IN TELEGRAM
๐Ÿ”ฅ S3MOT: SOTA 3D MOT ๐Ÿ”ฅ

๐Ÿ‘‰S3MOT: Selective-State-Space model-based MOT that efficiently infers 3D motion and object associations from 2D images through three core components. New SOTA on KITTI with 76.86 HOTA at 31 FPS! Code & Weights to be released under MIT license๐Ÿ’™

๐Ÿ‘‰Review https://t.ly/H_JPv
๐Ÿ‘‰Paper https://arxiv.org/pdf/2504.18068
๐Ÿ‘‰Repo https://github.com/bytepioneerX/s3mot
๐Ÿ”ฅ7๐Ÿ˜2๐Ÿ‘1
This media is not supported in your browser
VIEW IN TELEGRAM
๐Ÿ”ฅ Diffusion Model <-> Depth ๐Ÿ”ฅ

๐Ÿ‘‰ETH & CMU on how to turn a single-image latent diffusion model (LDM) into the SOTA video depth estimator: video depth without video models. Repo released under Apache 2.0 and HF demo available๐Ÿ’™

๐Ÿ‘‰Review https://t.ly/sP9ma
๐Ÿ‘‰Paper arxiv.org/pdf/2411.19189
๐Ÿ‘‰Project rollingdepth.github.io/
๐Ÿ‘‰Repo github.com/prs-eth/rollingdepth
๐Ÿค—Demo huggingface.co/spaces/prs-eth/rollingdepthhttps://t.ly/sP9ma
โค12๐Ÿ”ฅ6๐Ÿ‘3๐Ÿ‘1
This media is not supported in your browser
VIEW IN TELEGRAM
๐ŸฉทDance vs. #ComputerVision๐Ÿฉท

๐Ÿ‘‰The Saint-Etienne university proposed a new 3D human body pose estimation pipeline to deal with dance analysis. Project page w/ results and interactive demo released๐Ÿ’™

๐Ÿ‘‰Review https://t.ly/JEdM3
๐Ÿ‘‰Paper arxiv.org/pdf/2505.07249
๐Ÿ‘‰Project https://lnkd.in/dD5dsMv5
โค9๐Ÿ‘1๐Ÿ”ฅ1
This media is not supported in your browser
VIEW IN TELEGRAM
๐Ÿงžโ€โ™€๏ธGENMO: Generalist Human Motion ๐Ÿงžโ€โ™€๏ธ

๐Ÿ‘‰#Nvidia presents GENMO, a unified Generalist Model for Human Motion that bridges motion estimation and generation in a single framework. Conditioning on videos, 2D keypoints, text, music, and 3D keyframes. No code at the moment๐Ÿฅฒ

๐Ÿ‘‰Review https://t.ly/Q5T_Y
๐Ÿ‘‰Paper https://lnkd.in/ds36BY49
๐Ÿ‘‰Project https://lnkd.in/dAYHhuFU
๐Ÿ”ฅ13โค3๐Ÿ‘2๐Ÿ˜ข1๐Ÿ˜1
Dear friends,
Iโ€™m truly sorry for being away from the group for so long. I know: no updates so far while AI is running faster than speed of light.

Iโ€™m going through a very difficult time in my life and I need some space to heal. This spare-time project (but important for a lot of people here) needs energy and commitment I donโ€™t have right now. Iโ€™m sorry, be patient. Iโ€™ll be back.

Love u all,
Alessandro.
โค397๐Ÿ‘28๐Ÿ˜ข27
Hi everybody,
I took a few weeks to take a breath from a lot of stuff, I dedicated all my mental energy to keep working and I dedicated all my spare time to take care of myself. Despite I'm still not ok (BTW, my health was/is always good), I feel it's time to come back and support this wonderful community in this journey. I feel the responsibility of that, time to get in the ring.

I'm very sorry for being out so long, but sometime life hits really hard. I got an incredible support from unknown people from all around the world. It's amazing.

Thanks again, you rock!
Alessandro.
1โค195๐Ÿ‘16๐Ÿ”ฅ14๐Ÿ‘5๐Ÿพ3๐Ÿ˜ข2๐Ÿ’ฉ1
This media is not supported in your browser
VIEW IN TELEGRAM
๐Ÿฆ– DINOv3 is out ๐Ÿฆ–

๐Ÿ‘‰#Meta unveils DINOv3! A novel foundation model outperforming the previous SOTAs in computer vision. Code & weights released under DINOv3 License๐Ÿ’™

๐Ÿ‘‰Review https://t.ly/-S3ZL
๐Ÿ‘‰Paper https://t.ly/ervOT
๐Ÿ‘‰Project https://lnkd.in/dHFf3esd
๐Ÿ‘‰Repo https://lnkd.in/dPxhDxAq
๐Ÿค—HF https://lnkd.in/dWGudY2i
โค40๐Ÿ”ฅ13๐Ÿ‘2๐Ÿ˜1๐Ÿพ1
This media is not supported in your browser
VIEW IN TELEGRAM
๐Ÿค– Impact of SuperHuman AI ๐Ÿค–

๐Ÿ‘‰The NoProfit AI Futures Project unveils a (dystopic) scenario about what super-AI might look like. Forecast from today to the bio-engineered human-like creatures. A fascinating speculation of the future with the "slow-down" and "race" scenarios. Enjoy ๐Ÿ’™

๐Ÿ‘‰Review https://t.ly/EgmfJ
๐Ÿ‘‰Project https://ai-2027.com/
โค7๐Ÿคฏ2๐Ÿ”ฅ1๐Ÿคฃ1
This media is not supported in your browser
VIEW IN TELEGRAM
๐Ÿ“TOTNet: Occlusion-aware Tracking๐Ÿ“

๐Ÿ‘‰TOTNet: novel Temporal Occlusion Tracking Network that leverages 3D-convs, visibility-weighted loss, & occlusion augmentation to improve performance under occlusions. Code & Data under MIT๐Ÿ’™

๐Ÿ‘‰Review https://t.ly/Q0jAf
๐Ÿ‘‰Paper https://lnkd.in/dUYsa-GC
๐Ÿ‘‰Repo https://lnkd.in/d3QGUHYb
๐Ÿ”ฅ10โค5๐Ÿ‘1๐Ÿ˜1
This media is not supported in your browser
VIEW IN TELEGRAM
๐Ÿ”€Feed-Forward 4D video๐Ÿ”€

๐Ÿ‘‰4DNeX is the first feed-forward framework for generating 4D scene representations from a single image by fine-tuning diffusion model. HQ dynamic pt-clouds & downstream tasks such as novel-view video synthesis with strong generalizability. Code/Data announced ๐Ÿ’™

๐Ÿ‘‰Review https://t.ly/SpkD-
๐Ÿ‘‰Paper arxiv.org/pdf/2508.13154
๐Ÿ‘‰Project https://4dnex.github.io/
๐Ÿ‘‰Repo github.com/3DTopia/4DNeX
๐Ÿ‘‰Data https://lnkd.in/dh4_3Ghf
๐Ÿ‘‰Demo https://lnkd.in/dztyzwgg
โค9๐Ÿ”ฅ7๐Ÿ‘1
This media is not supported in your browser
VIEW IN TELEGRAM
๐ŸŒˆDAViD: Synthetic Depth-Normal-Segmentation๐ŸŒˆ

๐Ÿ‘‰#Microsoft's DAViD: 100% synthetic dataset/models for human Depth, Normals & Segmentation. Dataset available, models & runtime under MIT๐Ÿ’™

๐Ÿ‘‰Review https://t.ly/-SlO_
๐Ÿ‘‰Paper https://lnkd.in/eCmMXpTg
๐Ÿ‘‰Project https://lnkd.in/eurCSWkm
๐Ÿ‘‰Repo https://lnkd.in/e7PWFgP2
๐Ÿ‘7โค4๐Ÿ”ฅ2๐Ÿคฉ1
This media is not supported in your browser
VIEW IN TELEGRAM
๐Ÿ‘  OmniTry: Virtual Try-On Anything ๐Ÿ‘ 

๐Ÿ‘‰OmniTry: unified framework that extends VTON beyond garment to encompass any wearable objects (jewelries, accessories, etc.) in mask-free setting. Weights, HF demo & benchmark released๐Ÿ’™

๐Ÿ‘‰Review https://t.ly/wMBGQ
๐Ÿ‘‰Paper https://lnkd.in/dQe9MchS
๐Ÿ‘‰Project https://omnitry.github.io/
๐Ÿ‘‰Repo https://lnkd.in/d3QwAXY2
๐Ÿค—Demo https://lnkd.in/duUcZpVA
๐Ÿ”ฅ15โค4๐Ÿ˜ข1๐Ÿคฉ1
This media is not supported in your browser
VIEW IN TELEGRAM
๐Ÿ“ก ROVR Open Dataset is out ๐Ÿ“ก

๐Ÿ‘‰A novel large-scale open 3D dataset for autonomous driving, robotics, and 4D perception tasks. To be released for academic (for free) & commercial๐Ÿ’™

๐Ÿ‘‰Review https://t.ly/iDcvg
๐Ÿ‘‰Paper https://arxiv.org/pdf/2508.13977
๐Ÿ‘‰Project https://xiandaguo.net/ROVR-Open-Dataset
โค12๐Ÿ”ฅ4๐Ÿ‘1
This media is not supported in your browser
VIEW IN TELEGRAM
๐Ÿง‰ YOPO: SOTA 9-DoF Pose๐Ÿง‰

๐Ÿ‘‰Pit In Co. unveils YOPO, a novel single-stage, query-based framework that treats category-level 9-DoF estimation as a natural extension of 2D detection. A practical solution for mono-RGB, category-level, multi-obj pose estimation. Code & models announced (coming)๐Ÿ’™

๐Ÿ‘‰Review https://t.ly/cf_Cl
๐Ÿ‘‰Paper https://arxiv.org/pdf/2508.14965
๐Ÿ‘‰Project mikigom.github.io/YOPO-project-page/
๐Ÿ‘‰Repo TBA
โค7๐Ÿ”ฅ1๐Ÿคฉ1
๐Ÿ”ฌIntern-S1: SOTA MM-MoE ๐Ÿ”ฌ

๐Ÿ‘‰InternS1: a MM-MoE with 28B activated / 241b total parameters, continually pre-trained on 5T tokens, including 2.5T+ tokens from scientific domains. New SOTA for professional tasks, such as molecular synthesis planning, reaction condition prediction, etc. Models available under Apache 2.0๐Ÿ’™

๐Ÿ‘‰Review https://t.ly/3l5UW
๐Ÿ‘‰Paper arxiv.org/pdf/2508.15763
๐Ÿ‘‰Repo github.com/InternLM/Intern-S1
๐Ÿค—HF huggingface.co/internlm/Intern-S1
โค6๐Ÿ”ฅ1
This media is not supported in your browser
VIEW IN TELEGRAM
๐Ÿซ”ATLAS: SOTA Human Model๐Ÿซ”

๐Ÿ‘‰#META presents ATLAS, a novel high-fidelity body model learned from 600k high-res. scans captured using 240 synchronized cams. Code announced, to be released๐Ÿ’™

๐Ÿ‘‰Review https://t.ly/0hHud
๐Ÿ‘‰Paper arxiv.org/pdf/2508.15767
๐Ÿ‘‰Project jindapark.github.io/projects/atlas/
๐Ÿ‘‰Repo TBA
โค7๐Ÿ”ฅ7๐Ÿ‘1๐Ÿ˜1
This media is not supported in your browser
VIEW IN TELEGRAM
๐ŸงคDiffusive Hand from Signs๐Ÿงค

๐Ÿ‘‰LIGM + #NVIDIA unveil a novel generative model of 3D hand motions from Sign Language Data. Motion characteristics such as handshapes, locations, finger, hand & arm movements. Code, Models & Data to be released ๐Ÿ’™

๐Ÿ‘‰Review https://t.ly/HonX_
๐Ÿ‘‰Paper https://arxiv.org/pdf/2508.15902
๐Ÿ‘‰Project https://imagine.enpc.fr/~leore.bensabath/HandMDM/
๐Ÿ‘‰Data drive.google.com/drive/u/1/folders/1BLsu2hAqhAJ_gnGb9TNXW7MLiSuSEzEj
๐Ÿ‘‰Repo TBA
โค3๐Ÿ”ฅ3๐Ÿ‘2๐Ÿคฏ1