AI with Papers - Artificial Intelligence & Deep Learning
15.1K subscribers
135 photos
247 videos
13 files
1.31K links
All the AI with papers. Every day fresh updates about #DeepLearning, #MachineLearning, LLMs and #ComputerVision

Curated by Alessandro Ferrari | https://www.linkedin.com/in/visionarynet/

#artificialintelligence #machinelearning #ml #AI
Download Telegram
Hi everybody,
I took a few weeks to take a breath from a lot of stuff, I dedicated all my mental energy to keep working and I dedicated all my spare time to take care of myself. Despite I'm still not ok (BTW, my health was/is always good), I feel it's time to come back and support this wonderful community in this journey. I feel the responsibility of that, time to get in the ring.

I'm very sorry for being out so long, but sometime life hits really hard. I got an incredible support from unknown people from all around the world. It's amazing.

Thanks again, you rock!
Alessandro.
1โค198๐Ÿ‘16๐Ÿ”ฅ15๐Ÿ‘5๐Ÿพ3๐Ÿ˜ข2๐Ÿ’ฉ1
This media is not supported in your browser
VIEW IN TELEGRAM
๐Ÿฆ– DINOv3 is out ๐Ÿฆ–

๐Ÿ‘‰#Meta unveils DINOv3! A novel foundation model outperforming the previous SOTAs in computer vision. Code & weights released under DINOv3 License๐Ÿ’™

๐Ÿ‘‰Review https://t.ly/-S3ZL
๐Ÿ‘‰Paper https://t.ly/ervOT
๐Ÿ‘‰Project https://lnkd.in/dHFf3esd
๐Ÿ‘‰Repo https://lnkd.in/dPxhDxAq
๐Ÿค—HF https://lnkd.in/dWGudY2i
โค42๐Ÿ”ฅ13๐Ÿ‘2๐Ÿ˜1๐Ÿพ1
This media is not supported in your browser
VIEW IN TELEGRAM
๐Ÿค– Impact of SuperHuman AI ๐Ÿค–

๐Ÿ‘‰The NoProfit AI Futures Project unveils a (dystopic) scenario about what super-AI might look like. Forecast from today to the bio-engineered human-like creatures. A fascinating speculation of the future with the "slow-down" and "race" scenarios. Enjoy ๐Ÿ’™

๐Ÿ‘‰Review https://t.ly/EgmfJ
๐Ÿ‘‰Project https://ai-2027.com/
โค7๐Ÿ”ฅ2๐Ÿคฏ2๐Ÿคฃ1
This media is not supported in your browser
VIEW IN TELEGRAM
๐Ÿ“TOTNet: Occlusion-aware Tracking๐Ÿ“

๐Ÿ‘‰TOTNet: novel Temporal Occlusion Tracking Network that leverages 3D-convs, visibility-weighted loss, & occlusion augmentation to improve performance under occlusions. Code & Data under MIT๐Ÿ’™

๐Ÿ‘‰Review https://t.ly/Q0jAf
๐Ÿ‘‰Paper https://lnkd.in/dUYsa-GC
๐Ÿ‘‰Repo https://lnkd.in/d3QGUHYb
๐Ÿ”ฅ10โค6๐Ÿ‘1๐Ÿ˜1
This media is not supported in your browser
VIEW IN TELEGRAM
๐Ÿ”€Feed-Forward 4D video๐Ÿ”€

๐Ÿ‘‰4DNeX is the first feed-forward framework for generating 4D scene representations from a single image by fine-tuning diffusion model. HQ dynamic pt-clouds & downstream tasks such as novel-view video synthesis with strong generalizability. Code/Data announced ๐Ÿ’™

๐Ÿ‘‰Review https://t.ly/SpkD-
๐Ÿ‘‰Paper arxiv.org/pdf/2508.13154
๐Ÿ‘‰Project https://4dnex.github.io/
๐Ÿ‘‰Repo github.com/3DTopia/4DNeX
๐Ÿ‘‰Data https://lnkd.in/dh4_3Ghf
๐Ÿ‘‰Demo https://lnkd.in/dztyzwgg
โค10๐Ÿ”ฅ7๐Ÿ‘1
This media is not supported in your browser
VIEW IN TELEGRAM
๐ŸŒˆDAViD: Synthetic Depth-Normal-Segmentation๐ŸŒˆ

๐Ÿ‘‰#Microsoft's DAViD: 100% synthetic dataset/models for human Depth, Normals & Segmentation. Dataset available, models & runtime under MIT๐Ÿ’™

๐Ÿ‘‰Review https://t.ly/-SlO_
๐Ÿ‘‰Paper https://lnkd.in/eCmMXpTg
๐Ÿ‘‰Project https://lnkd.in/eurCSWkm
๐Ÿ‘‰Repo https://lnkd.in/e7PWFgP2
๐Ÿ‘7โค6๐Ÿ”ฅ3๐Ÿคฉ1
This media is not supported in your browser
VIEW IN TELEGRAM
๐Ÿ‘  OmniTry: Virtual Try-On Anything ๐Ÿ‘ 

๐Ÿ‘‰OmniTry: unified framework that extends VTON beyond garment to encompass any wearable objects (jewelries, accessories, etc.) in mask-free setting. Weights, HF demo & benchmark released๐Ÿ’™

๐Ÿ‘‰Review https://t.ly/wMBGQ
๐Ÿ‘‰Paper https://lnkd.in/dQe9MchS
๐Ÿ‘‰Project https://omnitry.github.io/
๐Ÿ‘‰Repo https://lnkd.in/d3QwAXY2
๐Ÿค—Demo https://lnkd.in/duUcZpVA
๐Ÿ”ฅ15โค5๐Ÿ˜ข1๐Ÿคฉ1
This media is not supported in your browser
VIEW IN TELEGRAM
๐Ÿ“ก ROVR Open Dataset is out ๐Ÿ“ก

๐Ÿ‘‰A novel large-scale open 3D dataset for autonomous driving, robotics, and 4D perception tasks. To be released for academic (for free) & commercial๐Ÿ’™

๐Ÿ‘‰Review https://t.ly/iDcvg
๐Ÿ‘‰Paper https://arxiv.org/pdf/2508.13977
๐Ÿ‘‰Project https://xiandaguo.net/ROVR-Open-Dataset
โค12๐Ÿ”ฅ4๐Ÿ‘1
This media is not supported in your browser
VIEW IN TELEGRAM
๐Ÿง‰ YOPO: SOTA 9-DoF Pose๐Ÿง‰

๐Ÿ‘‰Pit In Co. unveils YOPO, a novel single-stage, query-based framework that treats category-level 9-DoF estimation as a natural extension of 2D detection. A practical solution for mono-RGB, category-level, multi-obj pose estimation. Code & models announced (coming)๐Ÿ’™

๐Ÿ‘‰Review https://t.ly/cf_Cl
๐Ÿ‘‰Paper https://arxiv.org/pdf/2508.14965
๐Ÿ‘‰Project mikigom.github.io/YOPO-project-page/
๐Ÿ‘‰Repo TBA
โค8๐Ÿ”ฅ1๐Ÿคฉ1
๐Ÿ”ฌIntern-S1: SOTA MM-MoE ๐Ÿ”ฌ

๐Ÿ‘‰InternS1: a MM-MoE with 28B activated / 241b total parameters, continually pre-trained on 5T tokens, including 2.5T+ tokens from scientific domains. New SOTA for professional tasks, such as molecular synthesis planning, reaction condition prediction, etc. Models available under Apache 2.0๐Ÿ’™

๐Ÿ‘‰Review https://t.ly/3l5UW
๐Ÿ‘‰Paper arxiv.org/pdf/2508.15763
๐Ÿ‘‰Repo github.com/InternLM/Intern-S1
๐Ÿค—HF huggingface.co/internlm/Intern-S1
โค6๐Ÿ”ฅ1
This media is not supported in your browser
VIEW IN TELEGRAM
๐Ÿซ”ATLAS: SOTA Human Model๐Ÿซ”

๐Ÿ‘‰#META presents ATLAS, a novel high-fidelity body model learned from 600k high-res. scans captured using 240 synchronized cams. Code announced, to be released๐Ÿ’™

๐Ÿ‘‰Review https://t.ly/0hHud
๐Ÿ‘‰Paper arxiv.org/pdf/2508.15767
๐Ÿ‘‰Project jindapark.github.io/projects/atlas/
๐Ÿ‘‰Repo TBA
๐Ÿ”ฅ7โค6๐Ÿ‘1๐Ÿ˜1
This media is not supported in your browser
VIEW IN TELEGRAM
๐ŸงคDiffusive Hand from Signs๐Ÿงค

๐Ÿ‘‰LIGM + #NVIDIA unveil a novel generative model of 3D hand motions from Sign Language Data. Motion characteristics such as handshapes, locations, finger, hand & arm movements. Code, Models & Data to be released ๐Ÿ’™

๐Ÿ‘‰Review https://t.ly/HonX_
๐Ÿ‘‰Paper https://arxiv.org/pdf/2508.15902
๐Ÿ‘‰Project https://imagine.enpc.fr/~leore.bensabath/HandMDM/
๐Ÿ‘‰Data drive.google.com/drive/u/1/folders/1BLsu2hAqhAJ_gnGb9TNXW7MLiSuSEzEj
๐Ÿ‘‰Repo TBA
โค4๐Ÿ”ฅ3๐Ÿ‘2๐Ÿคฏ1
This media is not supported in your browser
VIEW IN TELEGRAM
๐ŸŽ๏ธ VROOM: F1 Reconstruction ๐ŸŽ๏ธ

๐Ÿ‘‰Berkeley unveils VROOM, the first attempt for reconstructing 3D models of #Formula1 circuits using only onboard camera footage from racecars. Extreme challenges due to noise & speed. Repo released๐Ÿ’™

๐Ÿ‘‰Review https://t.ly/uuHdT
๐Ÿ‘‰Paper arxiv.org/pdf/2508.17172
๐Ÿ‘‰Repo github.com/yajatyadav/vroom
๐Ÿ‘‰Project varun-bharadwaj.github.io/vroom/
1โค18๐Ÿ”ฅ5๐Ÿ‘1
ezgif-8120c4563e81c3.mp4
510.6 KB
๐Ÿฅถ OmniHuman-1.5 ๐Ÿฅถ

๐Ÿ‘‰#ByteDance proposes a novel framework designed to generate character animations that are not only physically plausible but also semantically coherent and expressive. Coherency with speech's rhythm, prosody and semantic content. Impressive results but no code ๐Ÿฅบ

๐Ÿ‘‰Review https://t.ly/CnRmX
๐Ÿ‘‰Paper arxiv.org/pdf/2508.19209
๐Ÿ‘‰Project omnihuman-lab.github.io/v1_5/
๐Ÿ‘‰Repo ๐Ÿฅบ
โค4๐Ÿคฏ2๐Ÿ‘1๐Ÿ”ฅ1
This media is not supported in your browser
VIEW IN TELEGRAM
โšฝSoccerNet 2025 results!โšฝ

๐Ÿ‘‰SoccerNet 2025 Challenges is the open benchmarking dedicated to advancing computer vision research in football video understanding. Repo available ๐Ÿ’™

๐Ÿ‘‰Review https://t.ly/MfHKg
๐Ÿ‘‰Paper https://arxiv.org/pdf/2508.19182
๐Ÿ‘‰Project https://www.soccer-net.org/
๐Ÿ‘‰Repo https://github.com/SoccerNet
โค15๐Ÿ”ฅ6๐Ÿ‘1
This media is not supported in your browser
VIEW IN TELEGRAM
๐ŸŒนROSE: Remove Objects & Effects๐ŸŒน

๐Ÿ‘‰Fix the objectโ€™s effects on environment: shadows, reflections, light, translucency and mirror. Model, Demo & Dataset available via Hugging Face๐Ÿ’™

๐Ÿ‘‰Review https://t.ly/_KFM0
๐Ÿ‘‰Paper https://lnkd.in/dNcTXQAE
๐Ÿ‘‰Project https://lnkd.in/dFGmYT5h
๐Ÿ‘‰Model https://lnkd.in/dhTT-VkN
๐Ÿ‘‰Demo https://lnkd.in/dimgXZT6
๐Ÿ‘‰Data https://lnkd.in/da7Jv667
โค15๐Ÿ‘3๐Ÿ˜2๐Ÿ”ฅ1
This media is not supported in your browser
VIEW IN TELEGRAM
๐Ÿ‰ Dress-up & Dance ๐Ÿ‰

๐Ÿ‘‰Novel diffusion framework that generates HQ 5-second-long 24 FPS VTON videos at 1152ร—720 of a user wearing desired garments while moving in accordance with a given reference video. Impressive results but no repo๐Ÿฅบ

๐Ÿ‘‰Review https://t.ly/7NeTL
๐Ÿ‘‰Paper arxiv.org/pdf/2508.21070
๐Ÿ‘‰Project immortalco.github.io/DressAndDance/
๐Ÿ‘‰Repo ๐Ÿฅบ
โค8๐Ÿ”ฅ2๐Ÿ‘1
This media is not supported in your browser
VIEW IN TELEGRAM
๐ŸŒˆ Multi-View 3D Tracking ๐ŸŒˆ

๐Ÿ‘‰MVTracker is the first data-driven multi-view 3D point tracker for tracking arbitrary 3D points across multiple cameras. Repo available๐Ÿ’™

๐Ÿ‘‰Review https://t.ly/rISMR
๐Ÿ‘‰Paper arxiv.org/pdf/2508.21060
๐Ÿ‘‰Project https://lnkd.in/drHtAmRC
๐Ÿ‘‰Repo https://lnkd.in/d4k8mg3B
โค10๐Ÿ”ฅ5๐Ÿ‘1
This media is not supported in your browser
VIEW IN TELEGRAM
โค๏ธโ€๐Ÿ”ฅPHD: Personalized 3D Humansโค๏ธโ€๐Ÿ”ฅ

๐Ÿ‘‰ETH & #Meta unveil PHD, a novel approach for personalized 3D human mesh recovery (HMR) and body fitting that leverages user-specific shape information. Code & models to be released๐Ÿ’™

๐Ÿ‘‰Review https://t.ly/IeRhH
๐Ÿ‘‰Paper https://arxiv.org/pdf/2508.21257
๐Ÿ‘‰Project https://phd-pose.github.io/
๐Ÿ‘‰Repo TBA
โค7๐Ÿ”ฅ2๐Ÿ‘1