This media is not supported in your browser
VIEW IN TELEGRAM
๐๏ธReal-Time Drag-Based Editing๐๏ธ
๐The Visual AI Lab unveils Inpaint4Drag, a novel framework that decomposes drag-based editing into pixel-space bidirectional warping/inpainting. Inspired by elastic object deformation. Demo and Code released (unknown license)๐
๐Review https://t.ly/H5nlR
๐Paper https://arxiv.org/pdf/2509.04582
๐Project https://visual-ai.github.io/inpaint4drag/
๐Repo https://github.com/Visual-AI/Inpaint4Drag
๐Demo https://colab.research.google.com/drive/1fzoyNzcJNZjM1_08FE9V2V20EQxGf4PH
๐The Visual AI Lab unveils Inpaint4Drag, a novel framework that decomposes drag-based editing into pixel-space bidirectional warping/inpainting. Inspired by elastic object deformation. Demo and Code released (unknown license)๐
๐Review https://t.ly/H5nlR
๐Paper https://arxiv.org/pdf/2509.04582
๐Project https://visual-ai.github.io/inpaint4drag/
๐Repo https://github.com/Visual-AI/Inpaint4Drag
๐Demo https://colab.research.google.com/drive/1fzoyNzcJNZjM1_08FE9V2V20EQxGf4PH
โค7๐ฅ7๐1
This media is not supported in your browser
VIEW IN TELEGRAM
๐ฉธFoundation Red Blood Cells๐ฉธ
๐RedDino from University of Cagliari is a self-supervised foundation model designed for red blood cell (RBC) morphology analysis. Trained on 1.25M RBC images, it's the new SOTA in shape classification. Code & Models released under Apache2.0๐
๐Review https://t.ly/uWAch
๐Paper arxiv.org/pdf/2508.08180
๐Code github.com/Snarci/RedDino
๐Models huggingface.co/collections/Snarcy/reddino-689a13e29241d2e5690202fc
๐RedDino from University of Cagliari is a self-supervised foundation model designed for red blood cell (RBC) morphology analysis. Trained on 1.25M RBC images, it's the new SOTA in shape classification. Code & Models released under Apache2.0๐
๐Review https://t.ly/uWAch
๐Paper arxiv.org/pdf/2508.08180
๐Code github.com/Snarci/RedDino
๐Models huggingface.co/collections/Snarcy/reddino-689a13e29241d2e5690202fc
โค18๐4๐ฅ2
This media is not supported in your browser
VIEW IN TELEGRAM
๐ป From Skin to Skeleton ๐ป
๐This paper try unifying the SMPL body model with BSM, a new Biomechanical Skeleton Model. The SKEL model is animatable like SMPL but with fewer, and biomechanically-realistic, degrees of freedom. Model, code, and data available for research๐
๐Review https://t.ly/JsI8M
๐Paper arxiv.org/pdf/2509.06607
๐Project https://skel.is.tue.mpg.de/
๐This paper try unifying the SMPL body model with BSM, a new Biomechanical Skeleton Model. The SKEL model is animatable like SMPL but with fewer, and biomechanically-realistic, degrees of freedom. Model, code, and data available for research๐
๐Review https://t.ly/JsI8M
๐Paper arxiv.org/pdf/2509.06607
๐Project https://skel.is.tue.mpg.de/
โค6๐3๐ฅ2๐1
This media is not supported in your browser
VIEW IN TELEGRAM
๐ฑ FoMo4Wheat Foundational Model ๐ฑ
๐PheniX Lab et al. unveil a novel family of foundational models tailored for wheat image tasks, suitable for classification, detection, counting and segmentation. Demo, Dataset, Model & Code under MIT๐
๐Review https://t.ly/UzM-Z
๐Paper arxiv.org/pdf/2509.06907
๐Project fomo4wheat.phenix-lab.com/
๐Repo github.com/PheniX-Lab/FoMo4Wheat?
๐Demo fomo4wheat.phenix-lab.com/demos
๐PheniX Lab et al. unveil a novel family of foundational models tailored for wheat image tasks, suitable for classification, detection, counting and segmentation. Demo, Dataset, Model & Code under MIT๐
๐Review https://t.ly/UzM-Z
๐Paper arxiv.org/pdf/2509.06907
๐Project fomo4wheat.phenix-lab.com/
๐Repo github.com/PheniX-Lab/FoMo4Wheat?
๐Demo fomo4wheat.phenix-lab.com/demos
โค9๐3๐ฅ1๐พ1
This media is not supported in your browser
VIEW IN TELEGRAM
๐Human-Centric Video Generation๐
๐Tsinghua & #ByteDance unveil HuMo: a unified, human-centric video generation framework designed to produce HQ fine-grained, and controllable human videos from multimodal inputs: text prompt following, consistent subject preservation, synchronized audio-driven motion. Repo released under Apache2.0๐
๐Review https://t.ly/3S8Yb
๐Paper https://arxiv.org/pdf/2509.08519
๐Project https://phantom-video.github.io/HuMo/
๐Repo https://github.com/Phantom-video/HuMo
๐Tsinghua & #ByteDance unveil HuMo: a unified, human-centric video generation framework designed to produce HQ fine-grained, and controllable human videos from multimodal inputs: text prompt following, consistent subject preservation, synchronized audio-driven motion. Repo released under Apache2.0๐
๐Review https://t.ly/3S8Yb
๐Paper https://arxiv.org/pdf/2509.08519
๐Project https://phantom-video.github.io/HuMo/
๐Repo https://github.com/Phantom-video/HuMo
๐ฅ8๐คฏ3โค2๐1
This media is not supported in your browser
VIEW IN TELEGRAM
๐ฅ 21,000+ Hours Dataset ๐ฅ
๐SpatialVID is a novel large-scale video dataset with explicit spatial annotations including camera poses, depth maps, structured captions and serialized motion instructions. The dataset consists of 7,089 hours of real-world dynamic scenes. Repo & Dataset Apache-2.0 ๐
๐Review https://t.ly/Y9o5k
๐Paper arxiv.org/pdf/2509.09676
๐Project nju-3dv.github.io/projects/SpatialVID/
๐Repo github.com/NJU-3DV/spatialVID
๐SpatialVID is a novel large-scale video dataset with explicit spatial annotations including camera poses, depth maps, structured captions and serialized motion instructions. The dataset consists of 7,089 hours of real-world dynamic scenes. Repo & Dataset Apache-2.0 ๐
๐Review https://t.ly/Y9o5k
๐Paper arxiv.org/pdf/2509.09676
๐Project nju-3dv.github.io/projects/SpatialVID/
๐Repo github.com/NJU-3DV/spatialVID
โค11๐ฅ8๐2๐คฏ1๐1
This media is not supported in your browser
VIEW IN TELEGRAM
๐ฆ Segment & Track Any Cell ๐ฆ
๐RWTH unveils a novel zero-shot cell tracking framework by integrating Segment Anything 2 (SAM2) into the tracking pipeline. Source Code released๐
๐Review https://t.ly/n_srg
๐Paper https://arxiv.org/pdf/2509.09943
๐Repo https://github.com/zhuchen96/sam4celltracking
๐RWTH unveils a novel zero-shot cell tracking framework by integrating Segment Anything 2 (SAM2) into the tracking pipeline. Source Code released๐
๐Review https://t.ly/n_srg
๐Paper https://arxiv.org/pdf/2509.09943
๐Repo https://github.com/zhuchen96/sam4celltracking
โค4๐ฅ2๐1
๐ฅ How We Use ChatGPT ๐ฅ
๐By July 2025, ChatGPT has 700M+ users sending more than 2.5B+ messages per day. About 29,000 messages per second. This paper documents eight important facts about ChatGPT usage in the last three years. 63 pages of impressive statistics. To read.๐
๐Review https://t.ly/QYHSi
๐By July 2025, ChatGPT has 700M+ users sending more than 2.5B+ messages per day. About 29,000 messages per second. This paper documents eight important facts about ChatGPT usage in the last three years. 63 pages of impressive statistics. To read.๐
๐Review https://t.ly/QYHSi
๐ฅ5โค1๐1
This media is not supported in your browser
VIEW IN TELEGRAM
๐ก๏ธ3D Prompted Vision-LLM๐ก๏ธ
๐#Nvidia unveils SR-3D, a novel aware vision-language model that connects single-view 2D images and multi-view 3D data through a shared visual token space. Flexible region prompting, allowing users to annotate regions with bounding boxes, segmentation masks on any frame, or directly in 3D, without the need for exhaustive multi-frame labeling. Code & Dataset announced๐
๐Review https://t.ly/5Y2c5
๐Paper https://arxiv.org/pdf/2509.13317
๐Project https://www.anjiecheng.me/sr3d
๐Repo TBA
๐#Nvidia unveils SR-3D, a novel aware vision-language model that connects single-view 2D images and multi-view 3D data through a shared visual token space. Flexible region prompting, allowing users to annotate regions with bounding boxes, segmentation masks on any frame, or directly in 3D, without the need for exhaustive multi-frame labeling. Code & Dataset announced๐
๐Review https://t.ly/5Y2c5
๐Paper https://arxiv.org/pdf/2509.13317
๐Project https://www.anjiecheng.me/sr3d
๐Repo TBA
โค6๐ฅ5๐1๐1
This media is not supported in your browser
VIEW IN TELEGRAM
๐ Superpixel Anything (SOTA) ๐
๐ SuperPixel Anything Model, a versatile framework for segmenting images. Extracting image features for superpixel generation blended with a large-scale pretrained model for semantic-agnostic segmentation to ensure superpixels alignement with masks. Damn romantic. Repo & Dataset available๐
๐Review https://t.ly/rpxRh
๐Paper arxiv.org/pdf/2509.12791
๐Repo github.com/waldo-j/spam
๐ SuperPixel Anything Model, a versatile framework for segmenting images. Extracting image features for superpixel generation blended with a large-scale pretrained model for semantic-agnostic segmentation to ensure superpixels alignement with masks. Damn romantic. Repo & Dataset available๐
๐Review https://t.ly/rpxRh
๐Paper arxiv.org/pdf/2509.12791
๐Repo github.com/waldo-j/spam
โค11๐ฅ5๐1
Iโm keeping the channel free from interaction to avoid SPAM. The only way to interact is commenting the post after being accepted in the subchannel. Do you like this setting?
Anonymous Poll
92%
โ
YES, keep this configuration
8%
โ NO, open the main channel to comment for everyone
โค2๐ฅ1
AI with Papers - Artificial Intelligence & Deep Learning pinned ยซIโm keeping the channel free from interaction to avoid SPAM. The only way to interact is commenting the post after being accepted in the subchannel. Do you like this setting?ยป
This media is not supported in your browser
VIEW IN TELEGRAM
๐ฝDAM for SAM2 Tracking๐ฝ
๐From the University of Ljubljana a novel distractor-aware drop-in memory module for SAM2. Reducing the tracking drift toward distractors and improves redetection capability after object occlusions. DAM4SAM outperforms SAM2.1, SOTA on 10 benchmarks. Repo released ๐
๐Review https://t.ly/8aR59
๐Paper https://arxiv.org/pdf/2509.13864
๐Project jovanavidenovic.github.io/dam-4-sam/
๐Repo github.com/jovanavidenovic/DAM4SAM
๐From the University of Ljubljana a novel distractor-aware drop-in memory module for SAM2. Reducing the tracking drift toward distractors and improves redetection capability after object occlusions. DAM4SAM outperforms SAM2.1, SOTA on 10 benchmarks. Repo released ๐
๐Review https://t.ly/8aR59
๐Paper https://arxiv.org/pdf/2509.13864
๐Project jovanavidenovic.github.io/dam-4-sam/
๐Repo github.com/jovanavidenovic/DAM4SAM
๐ฅ9๐4๐คฏ1
๐ฅ๐ฅ It's time to decide whether you want to give LinkedIn your data for AI training or not ๐ฅ๐ฅ
Poll: https://lnkd.in/p/ddnenZgH
Set here: https://linkedin.com/mypreferences/d/settings/data-for-ai-improvement
Poll: https://lnkd.in/p/ddnenZgH
Set here: https://linkedin.com/mypreferences/d/settings/data-for-ai-improvement
โค8๐ฅ1๐1
This media is not supported in your browser
VIEW IN TELEGRAM
๐ณ Invariant Saliency Detection ๐ณ
๐SI-SOD: invariant salient object detection in scenarios when multiple salient objects of significantly different sizes appear within a single image. Repo released๐
๐Review https://lnkd.in/p/dZBfbSsf
๐Paper https://arxiv.org/pdf/2509.15573
๐Project https://ferry-li.github.io/SI_SOD/
๐Repo https://github.com/Ferry-Li/SI-SOD
๐SI-SOD: invariant salient object detection in scenarios when multiple salient objects of significantly different sizes appear within a single image. Repo released๐
๐Review https://lnkd.in/p/dZBfbSsf
๐Paper https://arxiv.org/pdf/2509.15573
๐Project https://ferry-li.github.io/SI_SOD/
๐Repo https://github.com/Ferry-Li/SI-SOD
๐ฅ3โค1
This media is not supported in your browser
VIEW IN TELEGRAM
๐ซ WINNER of LSVOS Challenge ๐ซ
๐SaSaSa2VA introduces Segmentation Augmentation to improve global video understanding while remaining efficient, and employs Selective Averaging at inference to robustly fuse complementary predictions. This approach achieves SOTA on the 7th LSVOS Challenge (RVOS track). A practical solution with full repo under Apache๐
๐Review https://t.ly/aH4mB
๐Paper https://arxiv.org/pdf/2509.16972
๐Repo https://github.com/magic-research/Sa2VA
๐SaSaSa2VA introduces Segmentation Augmentation to improve global video understanding while remaining efficient, and employs Selective Averaging at inference to robustly fuse complementary predictions. This approach achieves SOTA on the 7th LSVOS Challenge (RVOS track). A practical solution with full repo under Apache๐
๐Review https://t.ly/aH4mB
๐Paper https://arxiv.org/pdf/2509.16972
๐Repo https://github.com/magic-research/Sa2VA
๐ฅ5โค3๐1
This media is not supported in your browser
VIEW IN TELEGRAM
๐MOSEv2 Challenge Winner๐
๐A practical solution for complex segmentation based on the Segment Concept (SeC), a concept-driven segmentation framework that shifts from conventional feature matching to the progressive construction and utilization of high-level, object-centric representations. Repo under Apache 2.0๐
๐Review https://t.ly/2MjNm
๐Paper arxiv.org/pdf/2509.19183
๐Paper (SeC) arxiv.org/pdf/2507.15852
๐Repo github.com/OpenIXCLab/SeC
๐Project rookiexiong7.github.io/projects/SeC/
๐A practical solution for complex segmentation based on the Segment Concept (SeC), a concept-driven segmentation framework that shifts from conventional feature matching to the progressive construction and utilization of high-level, object-centric representations. Repo under Apache 2.0๐
๐Review https://t.ly/2MjNm
๐Paper arxiv.org/pdf/2509.19183
๐Paper (SeC) arxiv.org/pdf/2507.15852
๐Repo github.com/OpenIXCLab/SeC
๐Project rookiexiong7.github.io/projects/SeC/
โค3๐1๐ฅ1
This media is not supported in your browser
VIEW IN TELEGRAM
๐ CLOPS: Vision-Driven Avatar ๐
๐CLOPS is the first human avatar solely uses egocentric vision to perceive its surroundings and navigate. CLOPS is able to realistically move in a scene and use egocentric vision in order to find a goal in a loop of visual perception & motion. Code announced๐
๐Review https://t.ly/RXp64
๐Paper https://arxiv.org/pdf/2509.19259
๐Project markos-diomataris.github.io/projects/clops/
๐Repo TBA
๐CLOPS is the first human avatar solely uses egocentric vision to perceive its surroundings and navigate. CLOPS is able to realistically move in a scene and use egocentric vision in order to find a goal in a loop of visual perception & motion. Code announced๐
๐Review https://t.ly/RXp64
๐Paper https://arxiv.org/pdf/2509.19259
๐Project markos-diomataris.github.io/projects/clops/
๐Repo TBA
โค9๐ฅ7๐1