AI with Papers - Artificial Intelligence & Deep Learning
15.1K subscribers
98 photos
245 videos
12 files
1.29K links
All the AI with papers. Every day fresh updates on Deep Learning, Machine Learning, and Computer Vision (with Papers).

Curated by Alessandro Ferrari | https://www.linkedin.com/in/visionarynet/
Download Telegram
ezgif-8120c4563e81c3.mp4
510.6 KB
πŸ₯Ά OmniHuman-1.5 πŸ₯Ά

πŸ‘‰#ByteDance proposes a novel framework designed to generate character animations that are not only physically plausible but also semantically coherent and expressive. Coherency with speech's rhythm, prosody and semantic content. Impressive results but no code πŸ₯Ί

πŸ‘‰Review https://t.ly/CnRmX
πŸ‘‰Paper arxiv.org/pdf/2508.19209
πŸ‘‰Project omnihuman-lab.github.io/v1_5/
πŸ‘‰Repo πŸ₯Ί
❀4🀯2πŸ‘1πŸ”₯1
This media is not supported in your browser
VIEW IN TELEGRAM
⚽SoccerNet 2025 results!⚽

πŸ‘‰SoccerNet 2025 Challenges is the open benchmarking dedicated to advancing computer vision research in football video understanding. Repo available πŸ’™

πŸ‘‰Review https://t.ly/MfHKg
πŸ‘‰Paper https://arxiv.org/pdf/2508.19182
πŸ‘‰Project https://www.soccer-net.org/
πŸ‘‰Repo https://github.com/SoccerNet
❀15πŸ”₯6πŸ‘1
This media is not supported in your browser
VIEW IN TELEGRAM
🌹ROSE: Remove Objects & Effects🌹

πŸ‘‰Fix the object’s effects on environment: shadows, reflections, light, translucency and mirror. Model, Demo & Dataset available via Hugging FaceπŸ’™

πŸ‘‰Review https://t.ly/_KFM0
πŸ‘‰Paper https://lnkd.in/dNcTXQAE
πŸ‘‰Project https://lnkd.in/dFGmYT5h
πŸ‘‰Model https://lnkd.in/dhTT-VkN
πŸ‘‰Demo https://lnkd.in/dimgXZT6
πŸ‘‰Data https://lnkd.in/da7Jv667
❀15πŸ‘3😍2πŸ”₯1
This media is not supported in your browser
VIEW IN TELEGRAM
πŸ‰ Dress-up & Dance πŸ‰

πŸ‘‰Novel diffusion framework that generates HQ 5-second-long 24 FPS VTON videos at 1152Γ—720 of a user wearing desired garments while moving in accordance with a given reference video. Impressive results but no repoπŸ₯Ί

πŸ‘‰Review https://t.ly/7NeTL
πŸ‘‰Paper arxiv.org/pdf/2508.21070
πŸ‘‰Project immortalco.github.io/DressAndDance/
πŸ‘‰Repo πŸ₯Ί
❀8πŸ”₯2πŸ‘1
This media is not supported in your browser
VIEW IN TELEGRAM
🌈 Multi-View 3D Tracking 🌈

πŸ‘‰MVTracker is the first data-driven multi-view 3D point tracker for tracking arbitrary 3D points across multiple cameras. Repo availableπŸ’™

πŸ‘‰Review https://t.ly/rISMR
πŸ‘‰Paper arxiv.org/pdf/2508.21060
πŸ‘‰Project https://lnkd.in/drHtAmRC
πŸ‘‰Repo https://lnkd.in/d4k8mg3B
❀10πŸ”₯5πŸ‘1
This media is not supported in your browser
VIEW IN TELEGRAM
❀️‍πŸ”₯PHD: Personalized 3D Humans❀️‍πŸ”₯

πŸ‘‰ETH & #Meta unveil PHD, a novel approach for personalized 3D human mesh recovery (HMR) and body fitting that leverages user-specific shape information. Code & models to be releasedπŸ’™

πŸ‘‰Review https://t.ly/IeRhH
πŸ‘‰Paper https://arxiv.org/pdf/2508.21257
πŸ‘‰Project https://phd-pose.github.io/
πŸ‘‰Repo TBA
❀7πŸ”₯2πŸ‘1
This media is not supported in your browser
VIEW IN TELEGRAM
πŸͺ΄ Pixie: Physics from Pixels πŸͺ΄

πŸ‘‰UPenn + MIT unveil Pixie: training a neural-net that maps pretrained visual features (i.e., CLIP) to dense material fields of physical properties in a single forward pass, enabling real‑time physics simulations. Repo & Dataset under MIT licenseπŸ’™

πŸ‘‰Review https://t.ly/1W0n5
πŸ‘‰Paper https://lnkd.in/dsHAHDqM
πŸ‘‰Project https://lnkd.in/dwrHRbRc
πŸ‘‰Repo https://lnkd.in/dy7bvjsK
❀5πŸ‘2πŸ”₯1
This media is not supported in your browser
VIEW IN TELEGRAM
πŸ«›TMR: Few-Shot Template-matchingπŸ«›

πŸ‘‰POSTECH unveils TMR, a novel and simple template-matching detector for few-shot pattern detection, achieving strong (and SOTA) results on diverse datasets. A new dataset (RPINE) released, repo soonπŸ’™

πŸ‘‰Review https://t.ly/WWAcL
πŸ‘‰Paper https://lnkd.in/dJbSu5vk
πŸ‘‰Project https://lnkd.in/dwcDnHHQ
πŸ‘‰Repo https://lnkd.in/dp7aw8Cs
πŸ”₯5❀3πŸ‘1
🧬 OpenVision 2 is out! 🧬

πŸ‘‰UCSC releases OpenVision2: a novel family of generative pretrained visual encoders that removes the text encoder and contrastive loss, training with caption-only supervision. Fully open, Apache 2.0πŸ’™

πŸ‘‰Review https://t.ly/Oma3w
πŸ‘‰Paper https://arxiv.org/pdf/2509.01644
πŸ‘‰Project https://ucsc-vlaa.github.io/OpenVision2/
πŸ‘‰Repo https://github.com/UCSC-VLAA/OpenVision
πŸ”₯7❀1πŸ‘1
This media is not supported in your browser
VIEW IN TELEGRAM
πŸ‰ #DoubleDragon with #AI πŸ‰

πŸ‘‰How Double Dragon would look like in real life? Each character has been transformed with #AI to capture their style, fighting spirit, and charisma, as if they had stepped right out of the game’s streets into the real world. AUDIO ON. Damn romanticπŸ’™

#artificialintelligence #machinelearning #ml #AI #deeplearning #computervision #AIwithPapers #metaverse #LLM

πŸ‘‰Post https://t.ly/0IpER
πŸ‘‰Channel https://www.youtube.com/@iaiaoh84
❀5πŸ‘2πŸ”₯1
This media is not supported in your browser
VIEW IN TELEGRAM
🍐 Promptable Human Mesh 🍐

πŸ‘‰PromptHMR is a promptable human pose/shape (HPS) estimation method that processes images with spatial or semantic prompts. It takes β€œside information” readily available from vision-language models or user input to improve the accuracy and robustness of 3D HPS. Code releasedπŸ’™

πŸ‘‰Review https://t.ly/zJ7S-
πŸ‘‰Paper arxiv.org/pdf/2504.06397
πŸ‘‰Project yufu-wang.github.io/phmr-page/
πŸ‘‰Repo github.com/yufu-wang/PromptHMR
🀣19❀10πŸ‘1πŸ”₯1
This media is not supported in your browser
VIEW IN TELEGRAM
πŸ”₯WebEyeTrack: real-time/web eyeπŸ”₯

πŸ‘‰WebEyeTrack is a novel framework that integrates lightweight SOTA gaze estimation models directly in the browser. Bringing deep‑learning gaze estimation to the web browser and explicitly accounts for head pose. Source Code released under MIT licenseπŸ’™

πŸ‘‰Review https://t.ly/Xon9h
πŸ‘‰Paper https://arxiv.org/pdf/2508.19544
πŸ‘‰Project redforestai.github.io/WebEyeTrack/
πŸ‘‰Repo github.com/RedForestAi/WebEyeTrack
πŸ”₯8❀3πŸ‘1
This media is not supported in your browser
VIEW IN TELEGRAM
βœ‚οΈ AI Open-Source Annotation βœ‚οΈ

πŸ‘‰VisioFirm by TOELT is a fully open-source, AI-powered image annotation tool designed to accelerate labeling for Computer Vision tasks like object detection, oriented BBs, and segmentation. Source code released under Apache 2.0πŸ’™

πŸ‘‰Review https://t.ly/MoMvv
πŸ‘‰Paper https://lnkd.in/dxTncSgv
πŸ‘‰Repo https://lnkd.in/dCWMXp3x
πŸ”₯12🀯4❀3πŸ‘3⚑1
Friends,
I’ve just open my IG account: https://www.instagram.com/aleferra.ig | Feel free to add me

What about posting stuff about AI on IG? Thoughts?
πŸ‘11🀯1
This media is not supported in your browser
VIEW IN TELEGRAM
πŸ–ŒοΈReal-Time Drag-Based EditingπŸ–ŒοΈ

πŸ‘‰The Visual AI Lab unveils Inpaint4Drag, a novel framework that decomposes drag-based editing into pixel-space bidirectional warping/inpainting. Inspired by elastic object deformation. Demo and Code released (unknown license)πŸ’™

πŸ‘‰Review https://t.ly/H5nlR
πŸ‘‰Paper https://arxiv.org/pdf/2509.04582
πŸ‘‰Project https://visual-ai.github.io/inpaint4drag/
πŸ‘‰Repo https://github.com/Visual-AI/Inpaint4Drag
πŸ‘‰Demo https://colab.research.google.com/drive/1fzoyNzcJNZjM1_08FE9V2V20EQxGf4PH
πŸ”₯7❀6πŸ‘1
This media is not supported in your browser
VIEW IN TELEGRAM
🩸Foundation Red Blood Cells🩸

πŸ‘‰RedDino from University of Cagliari is a self-supervised foundation model designed for red blood cell (RBC) morphology analysis. Trained on 1.25M RBC images, it's the new SOTA in shape classification. Code & Models released under Apache2.0πŸ’™

πŸ‘‰Review https://t.ly/uWAch
πŸ‘‰Paper arxiv.org/pdf/2508.08180
πŸ‘‰Code github.com/Snarci/RedDino
πŸ‘‰Models huggingface.co/collections/Snarcy/reddino-689a13e29241d2e5690202fc
❀16πŸ‘4πŸ”₯2
This media is not supported in your browser
VIEW IN TELEGRAM
πŸ‘» From Skin to Skeleton πŸ‘»

πŸ‘‰This paper try unifying the SMPL body model with BSM, a new Biomechanical Skeleton Model. The SKEL model is animatable like SMPL but with fewer, and biomechanically-realistic, degrees of freedom. Model, code, and data available for researchπŸ’™

πŸ‘‰Review https://t.ly/JsI8M
πŸ‘‰Paper arxiv.org/pdf/2509.06607
πŸ‘‰Project https://skel.is.tue.mpg.de/
❀3πŸ‘2πŸ”₯2πŸ‘1
This media is not supported in your browser
VIEW IN TELEGRAM
🌱 FoMo4Wheat Foundational Model 🌱

πŸ‘‰PheniX Lab et al. unveil a novel family of foundational models tailored for wheat image tasks, suitable for classification, detection, counting and segmentation. Demo, Dataset, Model & Code under MITπŸ’™

πŸ‘‰Review https://t.ly/UzM-Z
πŸ‘‰Paper arxiv.org/pdf/2509.06907
πŸ‘‰Project fomo4wheat.phenix-lab.com/
πŸ‘‰Repo github.com/PheniX-Lab/FoMo4Wheat?
πŸ‘‰Demo fomo4wheat.phenix-lab.com/demos
❀4πŸ‘3πŸ”₯1🍾1
This media is not supported in your browser
VIEW IN TELEGRAM
πŸ™Human-Centric Video GenerationπŸ™

πŸ‘‰Tsinghua & #ByteDance unveil HuMo: a unified, human-centric video generation framework designed to produce HQ fine-grained, and controllable human videos from multimodal inputs: text prompt following, consistent subject preservation, synchronized audio-driven motion. Repo released under Apache2.0πŸ’™

πŸ‘‰Review https://t.ly/3S8Yb
πŸ‘‰Paper https://arxiv.org/pdf/2509.08519
πŸ‘‰Project https://phantom-video.github.io/HuMo/
πŸ‘‰Repo https://github.com/Phantom-video/HuMo
πŸ”₯7🀯3❀1