AI with Papers - Artificial Intelligence & Deep Learning
15.1K subscribers
135 photos
247 videos
13 files
1.31K links
All the AI with papers. Every day fresh updates about #DeepLearning, #MachineLearning, LLMs and #ComputerVision

Curated by Alessandro Ferrari | https://www.linkedin.com/in/visionarynet/

#artificialintelligence #machinelearning #ml #AI
Download Telegram
Friends,
Iโ€™ve just open my IG account: https://www.instagram.com/aleferra.ig | Feel free to add me

What about posting stuff about AI on IG? Thoughts?
๐Ÿ‘11โค1๐Ÿคฏ1
This media is not supported in your browser
VIEW IN TELEGRAM
๐Ÿ–Œ๏ธReal-Time Drag-Based Editing๐Ÿ–Œ๏ธ

๐Ÿ‘‰The Visual AI Lab unveils Inpaint4Drag, a novel framework that decomposes drag-based editing into pixel-space bidirectional warping/inpainting. Inspired by elastic object deformation. Demo and Code released (unknown license)๐Ÿ’™

๐Ÿ‘‰Review https://t.ly/H5nlR
๐Ÿ‘‰Paper https://arxiv.org/pdf/2509.04582
๐Ÿ‘‰Project https://visual-ai.github.io/inpaint4drag/
๐Ÿ‘‰Repo https://github.com/Visual-AI/Inpaint4Drag
๐Ÿ‘‰Demo https://colab.research.google.com/drive/1fzoyNzcJNZjM1_08FE9V2V20EQxGf4PH
โค7๐Ÿ”ฅ7๐Ÿ‘1
This media is not supported in your browser
VIEW IN TELEGRAM
๐ŸฉธFoundation Red Blood Cells๐Ÿฉธ

๐Ÿ‘‰RedDino from University of Cagliari is a self-supervised foundation model designed for red blood cell (RBC) morphology analysis. Trained on 1.25M RBC images, it's the new SOTA in shape classification. Code & Models released under Apache2.0๐Ÿ’™

๐Ÿ‘‰Review https://t.ly/uWAch
๐Ÿ‘‰Paper arxiv.org/pdf/2508.08180
๐Ÿ‘‰Code github.com/Snarci/RedDino
๐Ÿ‘‰Models huggingface.co/collections/Snarcy/reddino-689a13e29241d2e5690202fc
โค18๐Ÿ‘4๐Ÿ”ฅ2
This media is not supported in your browser
VIEW IN TELEGRAM
๐Ÿ‘ป From Skin to Skeleton ๐Ÿ‘ป

๐Ÿ‘‰This paper try unifying the SMPL body model with BSM, a new Biomechanical Skeleton Model. The SKEL model is animatable like SMPL but with fewer, and biomechanically-realistic, degrees of freedom. Model, code, and data available for research๐Ÿ’™

๐Ÿ‘‰Review https://t.ly/JsI8M
๐Ÿ‘‰Paper arxiv.org/pdf/2509.06607
๐Ÿ‘‰Project https://skel.is.tue.mpg.de/
โค6๐Ÿ‘3๐Ÿ”ฅ2๐Ÿ‘1
This media is not supported in your browser
VIEW IN TELEGRAM
๐ŸŒฑ FoMo4Wheat Foundational Model ๐ŸŒฑ

๐Ÿ‘‰PheniX Lab et al. unveil a novel family of foundational models tailored for wheat image tasks, suitable for classification, detection, counting and segmentation. Demo, Dataset, Model & Code under MIT๐Ÿ’™

๐Ÿ‘‰Review https://t.ly/UzM-Z
๐Ÿ‘‰Paper arxiv.org/pdf/2509.06907
๐Ÿ‘‰Project fomo4wheat.phenix-lab.com/
๐Ÿ‘‰Repo github.com/PheniX-Lab/FoMo4Wheat?
๐Ÿ‘‰Demo fomo4wheat.phenix-lab.com/demos
โค9๐Ÿ‘3๐Ÿ”ฅ1๐Ÿพ1
This media is not supported in your browser
VIEW IN TELEGRAM
๐Ÿ™Human-Centric Video Generation๐Ÿ™

๐Ÿ‘‰Tsinghua & #ByteDance unveil HuMo: a unified, human-centric video generation framework designed to produce HQ fine-grained, and controllable human videos from multimodal inputs: text prompt following, consistent subject preservation, synchronized audio-driven motion. Repo released under Apache2.0๐Ÿ’™

๐Ÿ‘‰Review https://t.ly/3S8Yb
๐Ÿ‘‰Paper https://arxiv.org/pdf/2509.08519
๐Ÿ‘‰Project https://phantom-video.github.io/HuMo/
๐Ÿ‘‰Repo https://github.com/Phantom-video/HuMo
๐Ÿ”ฅ8๐Ÿคฏ3โค2๐Ÿ‘1
This media is not supported in your browser
VIEW IN TELEGRAM
๐Ÿ”ฅ 21,000+ Hours Dataset ๐Ÿ”ฅ

๐Ÿ‘‰SpatialVID is a novel large-scale video dataset with explicit spatial annotations including camera poses, depth maps, structured captions and serialized motion instructions. The dataset consists of 7,089 hours of real-world dynamic scenes. Repo & Dataset Apache-2.0 ๐Ÿ’™

๐Ÿ‘‰Review https://t.ly/Y9o5k
๐Ÿ‘‰Paper arxiv.org/pdf/2509.09676
๐Ÿ‘‰Project nju-3dv.github.io/projects/SpatialVID/
๐Ÿ‘‰Repo github.com/NJU-3DV/spatialVID
โค11๐Ÿ”ฅ8๐Ÿ‘2๐Ÿคฏ1๐Ÿ˜1
This media is not supported in your browser
VIEW IN TELEGRAM
๐Ÿฆ  Segment & Track Any Cell ๐Ÿฆ 

๐Ÿ‘‰RWTH unveils a novel zero-shot cell tracking framework by integrating Segment Anything 2 (SAM2) into the tracking pipeline. Source Code released๐Ÿ’™

๐Ÿ‘‰Review https://t.ly/n_srg
๐Ÿ‘‰Paper https://arxiv.org/pdf/2509.09943
๐Ÿ‘‰Repo https://github.com/zhuchen96/sam4celltracking
โค4๐Ÿ”ฅ2๐Ÿ‘1
๐Ÿ”ฅ How We Use ChatGPT ๐Ÿ”ฅ

๐Ÿ‘‰By July 2025, ChatGPT has 700M+ users sending more than 2.5B+ messages per day. About 29,000 messages per second. This paper documents eight important facts about ChatGPT usage in the last three years. 63 pages of impressive statistics. To read.๐Ÿ’™

๐Ÿ‘‰Review https://t.ly/QYHSi
๐Ÿ”ฅ5โค1๐Ÿ‘1
This media is not supported in your browser
VIEW IN TELEGRAM
๐Ÿ›ก๏ธ3D Prompted Vision-LLM๐Ÿ›ก๏ธ

๐Ÿ‘‰#Nvidia unveils SR-3D, a novel aware vision-language model that connects single-view 2D images and multi-view 3D data through a shared visual token space. Flexible region prompting, allowing users to annotate regions with bounding boxes, segmentation masks on any frame, or directly in 3D, without the need for exhaustive multi-frame labeling. Code & Dataset announced๐Ÿ’™

๐Ÿ‘‰Review https://t.ly/5Y2c5
๐Ÿ‘‰Paper https://arxiv.org/pdf/2509.13317
๐Ÿ‘‰Project https://www.anjiecheng.me/sr3d
๐Ÿ‘‰Repo TBA
โค6๐Ÿ”ฅ5๐Ÿ‘1๐Ÿ‘1
This media is not supported in your browser
VIEW IN TELEGRAM
๐Ÿ• Superpixel Anything (SOTA) ๐Ÿ•

๐Ÿ‘‰ SuperPixel Anything Model, a versatile framework for segmenting images. Extracting image features for superpixel generation blended with a large-scale pretrained model for semantic-agnostic segmentation to ensure superpixels alignement with masks. Damn romantic. Repo & Dataset available๐Ÿ’™

๐Ÿ‘‰Review https://t.ly/rpxRh
๐Ÿ‘‰Paper arxiv.org/pdf/2509.12791
๐Ÿ‘‰Repo github.com/waldo-j/spam
โค11๐Ÿ”ฅ5๐Ÿ‘1
Iโ€™m keeping the channel free from interaction to avoid SPAM. The only way to interact is commenting the post after being accepted in the subchannel. Do you like this setting?
Anonymous Poll
92%
โœ… YES, keep this configuration
8%
โŒ NO, open the main channel to comment for everyone
โค2๐Ÿ”ฅ1
AI with Papers - Artificial Intelligence & Deep Learning pinned ยซIโ€™m keeping the channel free from interaction to avoid SPAM. The only way to interact is commenting the post after being accepted in the subchannel. Do you like this setting?ยป
This media is not supported in your browser
VIEW IN TELEGRAM
๐Ÿ‘ฝDAM for SAM2 Tracking๐Ÿ‘ฝ

๐Ÿ‘‰From the University of Ljubljana a novel distractor-aware drop-in memory module for SAM2. Reducing the tracking drift toward distractors and improves redetection capability after object occlusions. DAM4SAM outperforms SAM2.1, SOTA on 10 benchmarks. Repo released ๐Ÿ’™

๐Ÿ‘‰Review https://t.ly/8aR59
๐Ÿ‘‰Paper https://arxiv.org/pdf/2509.13864
๐Ÿ‘‰Project jovanavidenovic.github.io/dam-4-sam/
๐Ÿ‘‰Repo github.com/jovanavidenovic/DAM4SAM
๐Ÿ”ฅ9๐Ÿ‘4๐Ÿคฏ1
๐Ÿ”ฅ๐Ÿ”ฅ It's time to decide whether you want to give LinkedIn your data for AI training or not ๐Ÿ”ฅ๐Ÿ”ฅ

Poll: https://lnkd.in/p/ddnenZgH

Set here: https://linkedin.com/mypreferences/d/settings/data-for-ai-improvement
โค8๐Ÿ”ฅ1๐Ÿ‘1
This media is not supported in your browser
VIEW IN TELEGRAM
๐Ÿณ Invariant Saliency Detection ๐Ÿณ

๐Ÿ‘‰SI-SOD: invariant salient object detection in scenarios when multiple salient objects of significantly different sizes appear within a single image. Repo released๐Ÿ’™

๐Ÿ‘‰Review https://lnkd.in/p/dZBfbSsf
๐Ÿ‘‰Paper https://arxiv.org/pdf/2509.15573
๐Ÿ‘‰Project https://ferry-li.github.io/SI_SOD/
๐Ÿ‘‰Repo https://github.com/Ferry-Li/SI-SOD
๐Ÿ”ฅ3โค1
This media is not supported in your browser
VIEW IN TELEGRAM
๐Ÿซ“ WINNER of LSVOS Challenge ๐Ÿซ“

๐Ÿ‘‰SaSaSa2VA introduces Segmentation Augmentation to improve global video understanding while remaining efficient, and employs Selective Averaging at inference to robustly fuse complementary predictions. This approach achieves SOTA on the 7th LSVOS Challenge (RVOS track). A practical solution with full repo under Apache๐Ÿ’™

๐Ÿ‘‰Review https://t.ly/aH4mB
๐Ÿ‘‰Paper https://arxiv.org/pdf/2509.16972
๐Ÿ‘‰Repo https://github.com/magic-research/Sa2VA
๐Ÿ”ฅ5โค3๐Ÿ‘1
This media is not supported in your browser
VIEW IN TELEGRAM
๐Ÿ†MOSEv2 Challenge Winner๐Ÿ†

๐Ÿ‘‰A practical solution for complex segmentation based on the Segment Concept (SeC), a concept-driven segmentation framework that shifts from conventional feature matching to the progressive construction and utilization of high-level, object-centric representations. Repo under Apache 2.0๐Ÿ’™

๐Ÿ‘‰Review https://t.ly/2MjNm
๐Ÿ‘‰Paper arxiv.org/pdf/2509.19183
๐Ÿ‘‰Paper (SeC) arxiv.org/pdf/2507.15852
๐Ÿ‘‰Repo github.com/OpenIXCLab/SeC
๐Ÿ‘‰Project rookiexiong7.github.io/projects/SeC/
โค3๐Ÿ‘1๐Ÿ”ฅ1
This media is not supported in your browser
VIEW IN TELEGRAM
๐ŸŒ€ CLOPS: Vision-Driven Avatar ๐ŸŒ€

๐Ÿ‘‰CLOPS is the first human avatar solely uses egocentric vision to perceive its surroundings and navigate. CLOPS is able to realistically move in a scene and use egocentric vision in order to find a goal in a loop of visual perception & motion. Code announced๐Ÿ’™

๐Ÿ‘‰Review https://t.ly/RXp64
๐Ÿ‘‰Paper https://arxiv.org/pdf/2509.19259
๐Ÿ‘‰Project markos-diomataris.github.io/projects/clops/
๐Ÿ‘‰Repo TBA
โค9๐Ÿ”ฅ7๐Ÿ‘1