AI with Papers - Artificial Intelligence & Deep Learning
15.1K subscribers
135 photos
247 videos
13 files
1.31K links
All the AI with papers. Every day fresh updates about #DeepLearning, #MachineLearning, LLMs and #ComputerVision

Curated by Alessandro Ferrari | https://www.linkedin.com/in/visionarynet/

#artificialintelligence #machinelearning #ml #AI
Download Telegram
🧬 OpenVision 2 is out! 🧬

πŸ‘‰UCSC releases OpenVision2: a novel family of generative pretrained visual encoders that removes the text encoder and contrastive loss, training with caption-only supervision. Fully open, Apache 2.0πŸ’™

πŸ‘‰Review https://t.ly/Oma3w
πŸ‘‰Paper https://arxiv.org/pdf/2509.01644
πŸ‘‰Project https://ucsc-vlaa.github.io/OpenVision2/
πŸ‘‰Repo https://github.com/UCSC-VLAA/OpenVision
πŸ”₯7❀1πŸ‘1
This media is not supported in your browser
VIEW IN TELEGRAM
πŸ‰ #DoubleDragon with #AI πŸ‰

πŸ‘‰How Double Dragon would look like in real life? Each character has been transformed with #AI to capture their style, fighting spirit, and charisma, as if they had stepped right out of the game’s streets into the real world. AUDIO ON. Damn romanticπŸ’™

#artificialintelligence #machinelearning #ml #AI #deeplearning #computervision #AIwithPapers #metaverse #LLM

πŸ‘‰Post https://t.ly/0IpER
πŸ‘‰Channel https://www.youtube.com/@iaiaoh84
❀5πŸ‘2πŸ”₯1
This media is not supported in your browser
VIEW IN TELEGRAM
🍐 Promptable Human Mesh 🍐

πŸ‘‰PromptHMR is a promptable human pose/shape (HPS) estimation method that processes images with spatial or semantic prompts. It takes β€œside information” readily available from vision-language models or user input to improve the accuracy and robustness of 3D HPS. Code releasedπŸ’™

πŸ‘‰Review https://t.ly/zJ7S-
πŸ‘‰Paper arxiv.org/pdf/2504.06397
πŸ‘‰Project yufu-wang.github.io/phmr-page/
πŸ‘‰Repo github.com/yufu-wang/PromptHMR
🀣21❀10πŸ‘1πŸ”₯1
This media is not supported in your browser
VIEW IN TELEGRAM
πŸ”₯WebEyeTrack: real-time/web eyeπŸ”₯

πŸ‘‰WebEyeTrack is a novel framework that integrates lightweight SOTA gaze estimation models directly in the browser. Bringing deep‑learning gaze estimation to the web browser and explicitly accounts for head pose. Source Code released under MIT licenseπŸ’™

πŸ‘‰Review https://t.ly/Xon9h
πŸ‘‰Paper https://arxiv.org/pdf/2508.19544
πŸ‘‰Project redforestai.github.io/WebEyeTrack/
πŸ‘‰Repo github.com/RedForestAi/WebEyeTrack
πŸ”₯8❀3πŸ‘1
This media is not supported in your browser
VIEW IN TELEGRAM
βœ‚οΈ AI Open-Source Annotation βœ‚οΈ

πŸ‘‰VisioFirm by TOELT is a fully open-source, AI-powered image annotation tool designed to accelerate labeling for Computer Vision tasks like object detection, oriented BBs, and segmentation. Source code released under Apache 2.0πŸ’™

πŸ‘‰Review https://t.ly/MoMvv
πŸ‘‰Paper https://lnkd.in/dxTncSgv
πŸ‘‰Repo https://lnkd.in/dCWMXp3x
πŸ”₯11❀4🀯4πŸ‘3⚑1
Friends,
I’ve just open my IG account: https://www.instagram.com/aleferra.ig | Feel free to add me

What about posting stuff about AI on IG? Thoughts?
πŸ‘11❀1🀯1
This media is not supported in your browser
VIEW IN TELEGRAM
πŸ–ŒοΈReal-Time Drag-Based EditingπŸ–ŒοΈ

πŸ‘‰The Visual AI Lab unveils Inpaint4Drag, a novel framework that decomposes drag-based editing into pixel-space bidirectional warping/inpainting. Inspired by elastic object deformation. Demo and Code released (unknown license)πŸ’™

πŸ‘‰Review https://t.ly/H5nlR
πŸ‘‰Paper https://arxiv.org/pdf/2509.04582
πŸ‘‰Project https://visual-ai.github.io/inpaint4drag/
πŸ‘‰Repo https://github.com/Visual-AI/Inpaint4Drag
πŸ‘‰Demo https://colab.research.google.com/drive/1fzoyNzcJNZjM1_08FE9V2V20EQxGf4PH
❀7πŸ”₯7πŸ‘1
This media is not supported in your browser
VIEW IN TELEGRAM
🩸Foundation Red Blood Cells🩸

πŸ‘‰RedDino from University of Cagliari is a self-supervised foundation model designed for red blood cell (RBC) morphology analysis. Trained on 1.25M RBC images, it's the new SOTA in shape classification. Code & Models released under Apache2.0πŸ’™

πŸ‘‰Review https://t.ly/uWAch
πŸ‘‰Paper arxiv.org/pdf/2508.08180
πŸ‘‰Code github.com/Snarci/RedDino
πŸ‘‰Models huggingface.co/collections/Snarcy/reddino-689a13e29241d2e5690202fc
❀18πŸ‘4πŸ”₯2
This media is not supported in your browser
VIEW IN TELEGRAM
πŸ‘» From Skin to Skeleton πŸ‘»

πŸ‘‰This paper try unifying the SMPL body model with BSM, a new Biomechanical Skeleton Model. The SKEL model is animatable like SMPL but with fewer, and biomechanically-realistic, degrees of freedom. Model, code, and data available for researchπŸ’™

πŸ‘‰Review https://t.ly/JsI8M
πŸ‘‰Paper arxiv.org/pdf/2509.06607
πŸ‘‰Project https://skel.is.tue.mpg.de/
❀6πŸ‘3πŸ”₯2πŸ‘1
This media is not supported in your browser
VIEW IN TELEGRAM
🌱 FoMo4Wheat Foundational Model 🌱

πŸ‘‰PheniX Lab et al. unveil a novel family of foundational models tailored for wheat image tasks, suitable for classification, detection, counting and segmentation. Demo, Dataset, Model & Code under MITπŸ’™

πŸ‘‰Review https://t.ly/UzM-Z
πŸ‘‰Paper arxiv.org/pdf/2509.06907
πŸ‘‰Project fomo4wheat.phenix-lab.com/
πŸ‘‰Repo github.com/PheniX-Lab/FoMo4Wheat?
πŸ‘‰Demo fomo4wheat.phenix-lab.com/demos
❀9πŸ‘3πŸ”₯1🍾1
This media is not supported in your browser
VIEW IN TELEGRAM
πŸ™Human-Centric Video GenerationπŸ™

πŸ‘‰Tsinghua & #ByteDance unveil HuMo: a unified, human-centric video generation framework designed to produce HQ fine-grained, and controllable human videos from multimodal inputs: text prompt following, consistent subject preservation, synchronized audio-driven motion. Repo released under Apache2.0πŸ’™

πŸ‘‰Review https://t.ly/3S8Yb
πŸ‘‰Paper https://arxiv.org/pdf/2509.08519
πŸ‘‰Project https://phantom-video.github.io/HuMo/
πŸ‘‰Repo https://github.com/Phantom-video/HuMo
πŸ”₯8🀯3❀2πŸ‘1
This media is not supported in your browser
VIEW IN TELEGRAM
πŸ”₯ 21,000+ Hours Dataset πŸ”₯

πŸ‘‰SpatialVID is a novel large-scale video dataset with explicit spatial annotations including camera poses, depth maps, structured captions and serialized motion instructions. The dataset consists of 7,089 hours of real-world dynamic scenes. Repo & Dataset Apache-2.0 πŸ’™

πŸ‘‰Review https://t.ly/Y9o5k
πŸ‘‰Paper arxiv.org/pdf/2509.09676
πŸ‘‰Project nju-3dv.github.io/projects/SpatialVID/
πŸ‘‰Repo github.com/NJU-3DV/spatialVID
❀11πŸ”₯8πŸ‘2🀯1😍1
This media is not supported in your browser
VIEW IN TELEGRAM
🦠 Segment & Track Any Cell 🦠

πŸ‘‰RWTH unveils a novel zero-shot cell tracking framework by integrating Segment Anything 2 (SAM2) into the tracking pipeline. Source Code releasedπŸ’™

πŸ‘‰Review https://t.ly/n_srg
πŸ‘‰Paper https://arxiv.org/pdf/2509.09943
πŸ‘‰Repo https://github.com/zhuchen96/sam4celltracking
❀4πŸ”₯2πŸ‘1
πŸ”₯ How We Use ChatGPT πŸ”₯

πŸ‘‰By July 2025, ChatGPT has 700M+ users sending more than 2.5B+ messages per day. About 29,000 messages per second. This paper documents eight important facts about ChatGPT usage in the last three years. 63 pages of impressive statistics. To read.πŸ’™

πŸ‘‰Review https://t.ly/QYHSi
πŸ”₯5❀1πŸ‘1
This media is not supported in your browser
VIEW IN TELEGRAM
πŸ›‘οΈ3D Prompted Vision-LLMπŸ›‘οΈ

πŸ‘‰#Nvidia unveils SR-3D, a novel aware vision-language model that connects single-view 2D images and multi-view 3D data through a shared visual token space. Flexible region prompting, allowing users to annotate regions with bounding boxes, segmentation masks on any frame, or directly in 3D, without the need for exhaustive multi-frame labeling. Code & Dataset announcedπŸ’™

πŸ‘‰Review https://t.ly/5Y2c5
πŸ‘‰Paper https://arxiv.org/pdf/2509.13317
πŸ‘‰Project https://www.anjiecheng.me/sr3d
πŸ‘‰Repo TBA
❀6πŸ”₯5πŸ‘1πŸ‘1
This media is not supported in your browser
VIEW IN TELEGRAM
πŸ• Superpixel Anything (SOTA) πŸ•

πŸ‘‰ SuperPixel Anything Model, a versatile framework for segmenting images. Extracting image features for superpixel generation blended with a large-scale pretrained model for semantic-agnostic segmentation to ensure superpixels alignement with masks. Damn romantic. Repo & Dataset availableπŸ’™

πŸ‘‰Review https://t.ly/rpxRh
πŸ‘‰Paper arxiv.org/pdf/2509.12791
πŸ‘‰Repo github.com/waldo-j/spam
❀11πŸ”₯5πŸ‘1
I’m keeping the channel free from interaction to avoid SPAM. The only way to interact is commenting the post after being accepted in the subchannel. Do you like this setting?
Anonymous Poll
92%
βœ… YES, keep this configuration
8%
❌ NO, open the main channel to comment for everyone
❀2πŸ”₯1
AI with Papers - Artificial Intelligence & Deep Learning pinned Β«I’m keeping the channel free from interaction to avoid SPAM. The only way to interact is commenting the post after being accepted in the subchannel. Do you like this setting?Β»
This media is not supported in your browser
VIEW IN TELEGRAM
πŸ‘½DAM for SAM2 TrackingπŸ‘½

πŸ‘‰From the University of Ljubljana a novel distractor-aware drop-in memory module for SAM2. Reducing the tracking drift toward distractors and improves redetection capability after object occlusions. DAM4SAM outperforms SAM2.1, SOTA on 10 benchmarks. Repo released πŸ’™

πŸ‘‰Review https://t.ly/8aR59
πŸ‘‰Paper https://arxiv.org/pdf/2509.13864
πŸ‘‰Project jovanavidenovic.github.io/dam-4-sam/
πŸ‘‰Repo github.com/jovanavidenovic/DAM4SAM
πŸ”₯9πŸ‘4🀯1
πŸ”₯πŸ”₯ It's time to decide whether you want to give LinkedIn your data for AI training or not πŸ”₯πŸ”₯

Poll: https://lnkd.in/p/ddnenZgH

Set here: https://linkedin.com/mypreferences/d/settings/data-for-ai-improvement
❀8πŸ”₯1πŸ‘1
This media is not supported in your browser
VIEW IN TELEGRAM
🐳 Invariant Saliency Detection 🐳

πŸ‘‰SI-SOD: invariant salient object detection in scenarios when multiple salient objects of significantly different sizes appear within a single image. Repo releasedπŸ’™

πŸ‘‰Review https://lnkd.in/p/dZBfbSsf
πŸ‘‰Paper https://arxiv.org/pdf/2509.15573
πŸ‘‰Project https://ferry-li.github.io/SI_SOD/
πŸ‘‰Repo https://github.com/Ferry-Li/SI-SOD
πŸ”₯3❀1