This media is not supported in your browser
VIEW IN TELEGRAM
✂️ AI Open-Source Annotation ✂️
👉VisioFirm by TOELT is a fully open-source, AI-powered image annotation tool designed to accelerate labeling for Computer Vision tasks like object detection, oriented BBs, and segmentation. Source code released under Apache 2.0💙
👉Review https://t.ly/MoMvv
👉Paper https://lnkd.in/dxTncSgv
👉Repo https://lnkd.in/dCWMXp3x
👉VisioFirm by TOELT is a fully open-source, AI-powered image annotation tool designed to accelerate labeling for Computer Vision tasks like object detection, oriented BBs, and segmentation. Source code released under Apache 2.0💙
👉Review https://t.ly/MoMvv
👉Paper https://lnkd.in/dxTncSgv
👉Repo https://lnkd.in/dCWMXp3x
🔥11❤4🤯4👍3⚡1
Friends,
I’ve just open my IG account: https://www.instagram.com/aleferra.ig | Feel free to add me
What about posting stuff about AI on IG? Thoughts?
I’ve just open my IG account: https://www.instagram.com/aleferra.ig | Feel free to add me
What about posting stuff about AI on IG? Thoughts?
👍11❤1🤯1
This media is not supported in your browser
VIEW IN TELEGRAM
🖌️Real-Time Drag-Based Editing🖌️
👉The Visual AI Lab unveils Inpaint4Drag, a novel framework that decomposes drag-based editing into pixel-space bidirectional warping/inpainting. Inspired by elastic object deformation. Demo and Code released (unknown license)💙
👉Review https://t.ly/H5nlR
👉Paper https://arxiv.org/pdf/2509.04582
👉Project https://visual-ai.github.io/inpaint4drag/
👉Repo https://github.com/Visual-AI/Inpaint4Drag
👉Demo https://colab.research.google.com/drive/1fzoyNzcJNZjM1_08FE9V2V20EQxGf4PH
👉The Visual AI Lab unveils Inpaint4Drag, a novel framework that decomposes drag-based editing into pixel-space bidirectional warping/inpainting. Inspired by elastic object deformation. Demo and Code released (unknown license)💙
👉Review https://t.ly/H5nlR
👉Paper https://arxiv.org/pdf/2509.04582
👉Project https://visual-ai.github.io/inpaint4drag/
👉Repo https://github.com/Visual-AI/Inpaint4Drag
👉Demo https://colab.research.google.com/drive/1fzoyNzcJNZjM1_08FE9V2V20EQxGf4PH
❤7🔥7👏1
This media is not supported in your browser
VIEW IN TELEGRAM
🩸Foundation Red Blood Cells🩸
👉RedDino from University of Cagliari is a self-supervised foundation model designed for red blood cell (RBC) morphology analysis. Trained on 1.25M RBC images, it's the new SOTA in shape classification. Code & Models released under Apache2.0💙
👉Review https://t.ly/uWAch
👉Paper arxiv.org/pdf/2508.08180
👉Code github.com/Snarci/RedDino
👉Models huggingface.co/collections/Snarcy/reddino-689a13e29241d2e5690202fc
👉RedDino from University of Cagliari is a self-supervised foundation model designed for red blood cell (RBC) morphology analysis. Trained on 1.25M RBC images, it's the new SOTA in shape classification. Code & Models released under Apache2.0💙
👉Review https://t.ly/uWAch
👉Paper arxiv.org/pdf/2508.08180
👉Code github.com/Snarci/RedDino
👉Models huggingface.co/collections/Snarcy/reddino-689a13e29241d2e5690202fc
❤18👍4🔥2
This media is not supported in your browser
VIEW IN TELEGRAM
👻 From Skin to Skeleton 👻
👉This paper try unifying the SMPL body model with BSM, a new Biomechanical Skeleton Model. The SKEL model is animatable like SMPL but with fewer, and biomechanically-realistic, degrees of freedom. Model, code, and data available for research💙
👉Review https://t.ly/JsI8M
👉Paper arxiv.org/pdf/2509.06607
👉Project https://skel.is.tue.mpg.de/
👉This paper try unifying the SMPL body model with BSM, a new Biomechanical Skeleton Model. The SKEL model is animatable like SMPL but with fewer, and biomechanically-realistic, degrees of freedom. Model, code, and data available for research💙
👉Review https://t.ly/JsI8M
👉Paper arxiv.org/pdf/2509.06607
👉Project https://skel.is.tue.mpg.de/
❤6👍3🔥2👏1
This media is not supported in your browser
VIEW IN TELEGRAM
🌱 FoMo4Wheat Foundational Model 🌱
👉PheniX Lab et al. unveil a novel family of foundational models tailored for wheat image tasks, suitable for classification, detection, counting and segmentation. Demo, Dataset, Model & Code under MIT💙
👉Review https://t.ly/UzM-Z
👉Paper arxiv.org/pdf/2509.06907
👉Project fomo4wheat.phenix-lab.com/
👉Repo github.com/PheniX-Lab/FoMo4Wheat?
👉Demo fomo4wheat.phenix-lab.com/demos
👉PheniX Lab et al. unveil a novel family of foundational models tailored for wheat image tasks, suitable for classification, detection, counting and segmentation. Demo, Dataset, Model & Code under MIT💙
👉Review https://t.ly/UzM-Z
👉Paper arxiv.org/pdf/2509.06907
👉Project fomo4wheat.phenix-lab.com/
👉Repo github.com/PheniX-Lab/FoMo4Wheat?
👉Demo fomo4wheat.phenix-lab.com/demos
❤9👍3🔥1🍾1
This media is not supported in your browser
VIEW IN TELEGRAM
🐙Human-Centric Video Generation🐙
👉Tsinghua & #ByteDance unveil HuMo: a unified, human-centric video generation framework designed to produce HQ fine-grained, and controllable human videos from multimodal inputs: text prompt following, consistent subject preservation, synchronized audio-driven motion. Repo released under Apache2.0💙
👉Review https://t.ly/3S8Yb
👉Paper https://arxiv.org/pdf/2509.08519
👉Project https://phantom-video.github.io/HuMo/
👉Repo https://github.com/Phantom-video/HuMo
👉Tsinghua & #ByteDance unveil HuMo: a unified, human-centric video generation framework designed to produce HQ fine-grained, and controllable human videos from multimodal inputs: text prompt following, consistent subject preservation, synchronized audio-driven motion. Repo released under Apache2.0💙
👉Review https://t.ly/3S8Yb
👉Paper https://arxiv.org/pdf/2509.08519
👉Project https://phantom-video.github.io/HuMo/
👉Repo https://github.com/Phantom-video/HuMo
🔥8🤯3❤2👏1
This media is not supported in your browser
VIEW IN TELEGRAM
🔥 21,000+ Hours Dataset 🔥
👉SpatialVID is a novel large-scale video dataset with explicit spatial annotations including camera poses, depth maps, structured captions and serialized motion instructions. The dataset consists of 7,089 hours of real-world dynamic scenes. Repo & Dataset Apache-2.0 💙
👉Review https://t.ly/Y9o5k
👉Paper arxiv.org/pdf/2509.09676
👉Project nju-3dv.github.io/projects/SpatialVID/
👉Repo github.com/NJU-3DV/spatialVID
👉SpatialVID is a novel large-scale video dataset with explicit spatial annotations including camera poses, depth maps, structured captions and serialized motion instructions. The dataset consists of 7,089 hours of real-world dynamic scenes. Repo & Dataset Apache-2.0 💙
👉Review https://t.ly/Y9o5k
👉Paper arxiv.org/pdf/2509.09676
👉Project nju-3dv.github.io/projects/SpatialVID/
👉Repo github.com/NJU-3DV/spatialVID
❤11🔥8👏2🤯1😍1
This media is not supported in your browser
VIEW IN TELEGRAM
🦠 Segment & Track Any Cell 🦠
👉RWTH unveils a novel zero-shot cell tracking framework by integrating Segment Anything 2 (SAM2) into the tracking pipeline. Source Code released💙
👉Review https://t.ly/n_srg
👉Paper https://arxiv.org/pdf/2509.09943
👉Repo https://github.com/zhuchen96/sam4celltracking
👉RWTH unveils a novel zero-shot cell tracking framework by integrating Segment Anything 2 (SAM2) into the tracking pipeline. Source Code released💙
👉Review https://t.ly/n_srg
👉Paper https://arxiv.org/pdf/2509.09943
👉Repo https://github.com/zhuchen96/sam4celltracking
❤4🔥2👍1
🔥 How We Use ChatGPT 🔥
👉By July 2025, ChatGPT has 700M+ users sending more than 2.5B+ messages per day. About 29,000 messages per second. This paper documents eight important facts about ChatGPT usage in the last three years. 63 pages of impressive statistics. To read.💙
👉Review https://t.ly/QYHSi
👉By July 2025, ChatGPT has 700M+ users sending more than 2.5B+ messages per day. About 29,000 messages per second. This paper documents eight important facts about ChatGPT usage in the last three years. 63 pages of impressive statistics. To read.💙
👉Review https://t.ly/QYHSi
🔥5❤1👍1
This media is not supported in your browser
VIEW IN TELEGRAM
🛡️3D Prompted Vision-LLM🛡️
👉#Nvidia unveils SR-3D, a novel aware vision-language model that connects single-view 2D images and multi-view 3D data through a shared visual token space. Flexible region prompting, allowing users to annotate regions with bounding boxes, segmentation masks on any frame, or directly in 3D, without the need for exhaustive multi-frame labeling. Code & Dataset announced💙
👉Review https://t.ly/5Y2c5
👉Paper https://arxiv.org/pdf/2509.13317
👉Project https://www.anjiecheng.me/sr3d
👉Repo TBA
👉#Nvidia unveils SR-3D, a novel aware vision-language model that connects single-view 2D images and multi-view 3D data through a shared visual token space. Flexible region prompting, allowing users to annotate regions with bounding boxes, segmentation masks on any frame, or directly in 3D, without the need for exhaustive multi-frame labeling. Code & Dataset announced💙
👉Review https://t.ly/5Y2c5
👉Paper https://arxiv.org/pdf/2509.13317
👉Project https://www.anjiecheng.me/sr3d
👉Repo TBA
❤6🔥5👍1👏1
This media is not supported in your browser
VIEW IN TELEGRAM
🍕 Superpixel Anything (SOTA) 🍕
👉 SuperPixel Anything Model, a versatile framework for segmenting images. Extracting image features for superpixel generation blended with a large-scale pretrained model for semantic-agnostic segmentation to ensure superpixels alignement with masks. Damn romantic. Repo & Dataset available💙
👉Review https://t.ly/rpxRh
👉Paper arxiv.org/pdf/2509.12791
👉Repo github.com/waldo-j/spam
👉 SuperPixel Anything Model, a versatile framework for segmenting images. Extracting image features for superpixel generation blended with a large-scale pretrained model for semantic-agnostic segmentation to ensure superpixels alignement with masks. Damn romantic. Repo & Dataset available💙
👉Review https://t.ly/rpxRh
👉Paper arxiv.org/pdf/2509.12791
👉Repo github.com/waldo-j/spam
❤11🔥5👏1
I’m keeping the channel free from interaction to avoid SPAM. The only way to interact is commenting the post after being accepted in the subchannel. Do you like this setting?
Anonymous Poll
92%
✅ YES, keep this configuration
8%
❌ NO, open the main channel to comment for everyone
❤2🔥1
AI with Papers - Artificial Intelligence & Deep Learning pinned «I’m keeping the channel free from interaction to avoid SPAM. The only way to interact is commenting the post after being accepted in the subchannel. Do you like this setting?»
This media is not supported in your browser
VIEW IN TELEGRAM
👽DAM for SAM2 Tracking👽
👉From the University of Ljubljana a novel distractor-aware drop-in memory module for SAM2. Reducing the tracking drift toward distractors and improves redetection capability after object occlusions. DAM4SAM outperforms SAM2.1, SOTA on 10 benchmarks. Repo released 💙
👉Review https://t.ly/8aR59
👉Paper https://arxiv.org/pdf/2509.13864
👉Project jovanavidenovic.github.io/dam-4-sam/
👉Repo github.com/jovanavidenovic/DAM4SAM
👉From the University of Ljubljana a novel distractor-aware drop-in memory module for SAM2. Reducing the tracking drift toward distractors and improves redetection capability after object occlusions. DAM4SAM outperforms SAM2.1, SOTA on 10 benchmarks. Repo released 💙
👉Review https://t.ly/8aR59
👉Paper https://arxiv.org/pdf/2509.13864
👉Project jovanavidenovic.github.io/dam-4-sam/
👉Repo github.com/jovanavidenovic/DAM4SAM
🔥9👍4🤯1
🔥🔥 It's time to decide whether you want to give LinkedIn your data for AI training or not 🔥🔥
Poll: https://lnkd.in/p/ddnenZgH
Set here: https://linkedin.com/mypreferences/d/settings/data-for-ai-improvement
Poll: https://lnkd.in/p/ddnenZgH
Set here: https://linkedin.com/mypreferences/d/settings/data-for-ai-improvement
❤8🔥1👏1
This media is not supported in your browser
VIEW IN TELEGRAM
🐳 Invariant Saliency Detection 🐳
👉SI-SOD: invariant salient object detection in scenarios when multiple salient objects of significantly different sizes appear within a single image. Repo released💙
👉Review https://lnkd.in/p/dZBfbSsf
👉Paper https://arxiv.org/pdf/2509.15573
👉Project https://ferry-li.github.io/SI_SOD/
👉Repo https://github.com/Ferry-Li/SI-SOD
👉SI-SOD: invariant salient object detection in scenarios when multiple salient objects of significantly different sizes appear within a single image. Repo released💙
👉Review https://lnkd.in/p/dZBfbSsf
👉Paper https://arxiv.org/pdf/2509.15573
👉Project https://ferry-li.github.io/SI_SOD/
👉Repo https://github.com/Ferry-Li/SI-SOD
🔥3❤1
This media is not supported in your browser
VIEW IN TELEGRAM
🫓 WINNER of LSVOS Challenge 🫓
👉SaSaSa2VA introduces Segmentation Augmentation to improve global video understanding while remaining efficient, and employs Selective Averaging at inference to robustly fuse complementary predictions. This approach achieves SOTA on the 7th LSVOS Challenge (RVOS track). A practical solution with full repo under Apache💙
👉Review https://t.ly/aH4mB
👉Paper https://arxiv.org/pdf/2509.16972
👉Repo https://github.com/magic-research/Sa2VA
👉SaSaSa2VA introduces Segmentation Augmentation to improve global video understanding while remaining efficient, and employs Selective Averaging at inference to robustly fuse complementary predictions. This approach achieves SOTA on the 7th LSVOS Challenge (RVOS track). A practical solution with full repo under Apache💙
👉Review https://t.ly/aH4mB
👉Paper https://arxiv.org/pdf/2509.16972
👉Repo https://github.com/magic-research/Sa2VA
🔥5❤3👍1
This media is not supported in your browser
VIEW IN TELEGRAM
🏆MOSEv2 Challenge Winner🏆
👉A practical solution for complex segmentation based on the Segment Concept (SeC), a concept-driven segmentation framework that shifts from conventional feature matching to the progressive construction and utilization of high-level, object-centric representations. Repo under Apache 2.0💙
👉Review https://t.ly/2MjNm
👉Paper arxiv.org/pdf/2509.19183
👉Paper (SeC) arxiv.org/pdf/2507.15852
👉Repo github.com/OpenIXCLab/SeC
👉Project rookiexiong7.github.io/projects/SeC/
👉A practical solution for complex segmentation based on the Segment Concept (SeC), a concept-driven segmentation framework that shifts from conventional feature matching to the progressive construction and utilization of high-level, object-centric representations. Repo under Apache 2.0💙
👉Review https://t.ly/2MjNm
👉Paper arxiv.org/pdf/2509.19183
👉Paper (SeC) arxiv.org/pdf/2507.15852
👉Repo github.com/OpenIXCLab/SeC
👉Project rookiexiong7.github.io/projects/SeC/
❤3👍1🔥1
This media is not supported in your browser
VIEW IN TELEGRAM
🌀 CLOPS: Vision-Driven Avatar 🌀
👉CLOPS is the first human avatar solely uses egocentric vision to perceive its surroundings and navigate. CLOPS is able to realistically move in a scene and use egocentric vision in order to find a goal in a loop of visual perception & motion. Code announced💙
👉Review https://t.ly/RXp64
👉Paper https://arxiv.org/pdf/2509.19259
👉Project markos-diomataris.github.io/projects/clops/
👉Repo TBA
👉CLOPS is the first human avatar solely uses egocentric vision to perceive its surroundings and navigate. CLOPS is able to realistically move in a scene and use egocentric vision in order to find a goal in a loop of visual perception & motion. Code announced💙
👉Review https://t.ly/RXp64
👉Paper https://arxiv.org/pdf/2509.19259
👉Project markos-diomataris.github.io/projects/clops/
👉Repo TBA
❤9🔥7👍1