This media is not supported in your browser
VIEW IN TELEGRAM
π #DoubleDragon with #AI π
πHow Double Dragon would look like in real life? Each character has been transformed with #AI to capture their style, fighting spirit, and charisma, as if they had stepped right out of the gameβs streets into the real world. AUDIO ON. Damn romanticπ
#artificialintelligence #machinelearning #ml #AI #deeplearning #computervision #AIwithPapers #metaverse #LLM
πPost https://t.ly/0IpER
πChannel https://www.youtube.com/@iaiaoh84
πHow Double Dragon would look like in real life? Each character has been transformed with #AI to capture their style, fighting spirit, and charisma, as if they had stepped right out of the gameβs streets into the real world. AUDIO ON. Damn romanticπ
#artificialintelligence #machinelearning #ml #AI #deeplearning #computervision #AIwithPapers #metaverse #LLM
πPost https://t.ly/0IpER
πChannel https://www.youtube.com/@iaiaoh84
β€5π2π₯1
This media is not supported in your browser
VIEW IN TELEGRAM
π Promptable Human Mesh π
πPromptHMR is a promptable human pose/shape (HPS) estimation method that processes images with spatial or semantic prompts. It takes βside informationβ readily available from vision-language models or user input to improve the accuracy and robustness of 3D HPS. Code releasedπ
πReview https://t.ly/zJ7S-
πPaper arxiv.org/pdf/2504.06397
πProject yufu-wang.github.io/phmr-page/
πRepo github.com/yufu-wang/PromptHMR
πPromptHMR is a promptable human pose/shape (HPS) estimation method that processes images with spatial or semantic prompts. It takes βside informationβ readily available from vision-language models or user input to improve the accuracy and robustness of 3D HPS. Code releasedπ
πReview https://t.ly/zJ7S-
πPaper arxiv.org/pdf/2504.06397
πProject yufu-wang.github.io/phmr-page/
πRepo github.com/yufu-wang/PromptHMR
π€£21β€10π1π₯1
This media is not supported in your browser
VIEW IN TELEGRAM
π₯WebEyeTrack: real-time/web eyeπ₯
πWebEyeTrack is a novel framework that integrates lightweight SOTA gaze estimation models directly in the browser. Bringing deepβlearning gaze estimation to the web browser and explicitly accounts for head pose. Source Code released under MIT licenseπ
πReview https://t.ly/Xon9h
πPaper https://arxiv.org/pdf/2508.19544
πProject redforestai.github.io/WebEyeTrack/
πRepo github.com/RedForestAi/WebEyeTrack
πWebEyeTrack is a novel framework that integrates lightweight SOTA gaze estimation models directly in the browser. Bringing deepβlearning gaze estimation to the web browser and explicitly accounts for head pose. Source Code released under MIT licenseπ
πReview https://t.ly/Xon9h
πPaper https://arxiv.org/pdf/2508.19544
πProject redforestai.github.io/WebEyeTrack/
πRepo github.com/RedForestAi/WebEyeTrack
π₯8β€3π1
This media is not supported in your browser
VIEW IN TELEGRAM
βοΈ AI Open-Source Annotation βοΈ
πVisioFirm by TOELT is a fully open-source, AI-powered image annotation tool designed to accelerate labeling for Computer Vision tasks like object detection, oriented BBs, and segmentation. Source code released under Apache 2.0π
πReview https://t.ly/MoMvv
πPaper https://lnkd.in/dxTncSgv
πRepo https://lnkd.in/dCWMXp3x
πVisioFirm by TOELT is a fully open-source, AI-powered image annotation tool designed to accelerate labeling for Computer Vision tasks like object detection, oriented BBs, and segmentation. Source code released under Apache 2.0π
πReview https://t.ly/MoMvv
πPaper https://lnkd.in/dxTncSgv
πRepo https://lnkd.in/dCWMXp3x
π₯11β€4π€―4π3β‘1
Friends,
Iβve just open my IG account: https://www.instagram.com/aleferra.ig | Feel free to add me
What about posting stuff about AI on IG? Thoughts?
Iβve just open my IG account: https://www.instagram.com/aleferra.ig | Feel free to add me
What about posting stuff about AI on IG? Thoughts?
π11β€1π€―1
This media is not supported in your browser
VIEW IN TELEGRAM
ποΈReal-Time Drag-Based EditingποΈ
πThe Visual AI Lab unveils Inpaint4Drag, a novel framework that decomposes drag-based editing into pixel-space bidirectional warping/inpainting. Inspired by elastic object deformation. Demo and Code released (unknown license)π
πReview https://t.ly/H5nlR
πPaper https://arxiv.org/pdf/2509.04582
πProject https://visual-ai.github.io/inpaint4drag/
πRepo https://github.com/Visual-AI/Inpaint4Drag
πDemo https://colab.research.google.com/drive/1fzoyNzcJNZjM1_08FE9V2V20EQxGf4PH
πThe Visual AI Lab unveils Inpaint4Drag, a novel framework that decomposes drag-based editing into pixel-space bidirectional warping/inpainting. Inspired by elastic object deformation. Demo and Code released (unknown license)π
πReview https://t.ly/H5nlR
πPaper https://arxiv.org/pdf/2509.04582
πProject https://visual-ai.github.io/inpaint4drag/
πRepo https://github.com/Visual-AI/Inpaint4Drag
πDemo https://colab.research.google.com/drive/1fzoyNzcJNZjM1_08FE9V2V20EQxGf4PH
β€7π₯7π1
This media is not supported in your browser
VIEW IN TELEGRAM
π©ΈFoundation Red Blood Cellsπ©Έ
πRedDino from University of Cagliari is a self-supervised foundation model designed for red blood cell (RBC) morphology analysis. Trained on 1.25M RBC images, it's the new SOTA in shape classification. Code & Models released under Apache2.0π
πReview https://t.ly/uWAch
πPaper arxiv.org/pdf/2508.08180
πCode github.com/Snarci/RedDino
πModels huggingface.co/collections/Snarcy/reddino-689a13e29241d2e5690202fc
πRedDino from University of Cagliari is a self-supervised foundation model designed for red blood cell (RBC) morphology analysis. Trained on 1.25M RBC images, it's the new SOTA in shape classification. Code & Models released under Apache2.0π
πReview https://t.ly/uWAch
πPaper arxiv.org/pdf/2508.08180
πCode github.com/Snarci/RedDino
πModels huggingface.co/collections/Snarcy/reddino-689a13e29241d2e5690202fc
β€18π4π₯2
This media is not supported in your browser
VIEW IN TELEGRAM
π» From Skin to Skeleton π»
πThis paper try unifying the SMPL body model with BSM, a new Biomechanical Skeleton Model. The SKEL model is animatable like SMPL but with fewer, and biomechanically-realistic, degrees of freedom. Model, code, and data available for researchπ
πReview https://t.ly/JsI8M
πPaper arxiv.org/pdf/2509.06607
πProject https://skel.is.tue.mpg.de/
πThis paper try unifying the SMPL body model with BSM, a new Biomechanical Skeleton Model. The SKEL model is animatable like SMPL but with fewer, and biomechanically-realistic, degrees of freedom. Model, code, and data available for researchπ
πReview https://t.ly/JsI8M
πPaper arxiv.org/pdf/2509.06607
πProject https://skel.is.tue.mpg.de/
β€6π3π₯2π1
This media is not supported in your browser
VIEW IN TELEGRAM
π± FoMo4Wheat Foundational Model π±
πPheniX Lab et al. unveil a novel family of foundational models tailored for wheat image tasks, suitable for classification, detection, counting and segmentation. Demo, Dataset, Model & Code under MITπ
πReview https://t.ly/UzM-Z
πPaper arxiv.org/pdf/2509.06907
πProject fomo4wheat.phenix-lab.com/
πRepo github.com/PheniX-Lab/FoMo4Wheat?
πDemo fomo4wheat.phenix-lab.com/demos
πPheniX Lab et al. unveil a novel family of foundational models tailored for wheat image tasks, suitable for classification, detection, counting and segmentation. Demo, Dataset, Model & Code under MITπ
πReview https://t.ly/UzM-Z
πPaper arxiv.org/pdf/2509.06907
πProject fomo4wheat.phenix-lab.com/
πRepo github.com/PheniX-Lab/FoMo4Wheat?
πDemo fomo4wheat.phenix-lab.com/demos
β€9π3π₯1πΎ1
This media is not supported in your browser
VIEW IN TELEGRAM
πHuman-Centric Video Generationπ
πTsinghua & #ByteDance unveil HuMo: a unified, human-centric video generation framework designed to produce HQ fine-grained, and controllable human videos from multimodal inputs: text prompt following, consistent subject preservation, synchronized audio-driven motion. Repo released under Apache2.0π
πReview https://t.ly/3S8Yb
πPaper https://arxiv.org/pdf/2509.08519
πProject https://phantom-video.github.io/HuMo/
πRepo https://github.com/Phantom-video/HuMo
πTsinghua & #ByteDance unveil HuMo: a unified, human-centric video generation framework designed to produce HQ fine-grained, and controllable human videos from multimodal inputs: text prompt following, consistent subject preservation, synchronized audio-driven motion. Repo released under Apache2.0π
πReview https://t.ly/3S8Yb
πPaper https://arxiv.org/pdf/2509.08519
πProject https://phantom-video.github.io/HuMo/
πRepo https://github.com/Phantom-video/HuMo
π₯8π€―3β€2π1
This media is not supported in your browser
VIEW IN TELEGRAM
π₯ 21,000+ Hours Dataset π₯
πSpatialVID is a novel large-scale video dataset with explicit spatial annotations including camera poses, depth maps, structured captions and serialized motion instructions. The dataset consists of 7,089 hours of real-world dynamic scenes. Repo & Dataset Apache-2.0 π
πReview https://t.ly/Y9o5k
πPaper arxiv.org/pdf/2509.09676
πProject nju-3dv.github.io/projects/SpatialVID/
πRepo github.com/NJU-3DV/spatialVID
πSpatialVID is a novel large-scale video dataset with explicit spatial annotations including camera poses, depth maps, structured captions and serialized motion instructions. The dataset consists of 7,089 hours of real-world dynamic scenes. Repo & Dataset Apache-2.0 π
πReview https://t.ly/Y9o5k
πPaper arxiv.org/pdf/2509.09676
πProject nju-3dv.github.io/projects/SpatialVID/
πRepo github.com/NJU-3DV/spatialVID
β€11π₯8π2π€―1π1
This media is not supported in your browser
VIEW IN TELEGRAM
π¦ Segment & Track Any Cell π¦
πRWTH unveils a novel zero-shot cell tracking framework by integrating Segment Anything 2 (SAM2) into the tracking pipeline. Source Code releasedπ
πReview https://t.ly/n_srg
πPaper https://arxiv.org/pdf/2509.09943
πRepo https://github.com/zhuchen96/sam4celltracking
πRWTH unveils a novel zero-shot cell tracking framework by integrating Segment Anything 2 (SAM2) into the tracking pipeline. Source Code releasedπ
πReview https://t.ly/n_srg
πPaper https://arxiv.org/pdf/2509.09943
πRepo https://github.com/zhuchen96/sam4celltracking
β€4π₯2π1
π₯ How We Use ChatGPT π₯
πBy July 2025, ChatGPT has 700M+ users sending more than 2.5B+ messages per day. About 29,000 messages per second. This paper documents eight important facts about ChatGPT usage in the last three years. 63 pages of impressive statistics. To read.π
πReview https://t.ly/QYHSi
πBy July 2025, ChatGPT has 700M+ users sending more than 2.5B+ messages per day. About 29,000 messages per second. This paper documents eight important facts about ChatGPT usage in the last three years. 63 pages of impressive statistics. To read.π
πReview https://t.ly/QYHSi
π₯5β€1π1
This media is not supported in your browser
VIEW IN TELEGRAM
π‘οΈ3D Prompted Vision-LLMπ‘οΈ
π#Nvidia unveils SR-3D, a novel aware vision-language model that connects single-view 2D images and multi-view 3D data through a shared visual token space. Flexible region prompting, allowing users to annotate regions with bounding boxes, segmentation masks on any frame, or directly in 3D, without the need for exhaustive multi-frame labeling. Code & Dataset announcedπ
πReview https://t.ly/5Y2c5
πPaper https://arxiv.org/pdf/2509.13317
πProject https://www.anjiecheng.me/sr3d
πRepo TBA
π#Nvidia unveils SR-3D, a novel aware vision-language model that connects single-view 2D images and multi-view 3D data through a shared visual token space. Flexible region prompting, allowing users to annotate regions with bounding boxes, segmentation masks on any frame, or directly in 3D, without the need for exhaustive multi-frame labeling. Code & Dataset announcedπ
πReview https://t.ly/5Y2c5
πPaper https://arxiv.org/pdf/2509.13317
πProject https://www.anjiecheng.me/sr3d
πRepo TBA
β€6π₯5π1π1
This media is not supported in your browser
VIEW IN TELEGRAM
π Superpixel Anything (SOTA) π
π SuperPixel Anything Model, a versatile framework for segmenting images. Extracting image features for superpixel generation blended with a large-scale pretrained model for semantic-agnostic segmentation to ensure superpixels alignement with masks. Damn romantic. Repo & Dataset availableπ
πReview https://t.ly/rpxRh
πPaper arxiv.org/pdf/2509.12791
πRepo github.com/waldo-j/spam
π SuperPixel Anything Model, a versatile framework for segmenting images. Extracting image features for superpixel generation blended with a large-scale pretrained model for semantic-agnostic segmentation to ensure superpixels alignement with masks. Damn romantic. Repo & Dataset availableπ
πReview https://t.ly/rpxRh
πPaper arxiv.org/pdf/2509.12791
πRepo github.com/waldo-j/spam
β€11π₯5π1
Iβm keeping the channel free from interaction to avoid SPAM. The only way to interact is commenting the post after being accepted in the subchannel. Do you like this setting?
Anonymous Poll
92%
β
YES, keep this configuration
8%
β NO, open the main channel to comment for everyone
β€2π₯1
AI with Papers - Artificial Intelligence & Deep Learning pinned Β«Iβm keeping the channel free from interaction to avoid SPAM. The only way to interact is commenting the post after being accepted in the subchannel. Do you like this setting?Β»
This media is not supported in your browser
VIEW IN TELEGRAM
π½DAM for SAM2 Trackingπ½
πFrom the University of Ljubljana a novel distractor-aware drop-in memory module for SAM2. Reducing the tracking drift toward distractors and improves redetection capability after object occlusions. DAM4SAM outperforms SAM2.1, SOTA on 10 benchmarks. Repo released π
πReview https://t.ly/8aR59
πPaper https://arxiv.org/pdf/2509.13864
πProject jovanavidenovic.github.io/dam-4-sam/
πRepo github.com/jovanavidenovic/DAM4SAM
πFrom the University of Ljubljana a novel distractor-aware drop-in memory module for SAM2. Reducing the tracking drift toward distractors and improves redetection capability after object occlusions. DAM4SAM outperforms SAM2.1, SOTA on 10 benchmarks. Repo released π
πReview https://t.ly/8aR59
πPaper https://arxiv.org/pdf/2509.13864
πProject jovanavidenovic.github.io/dam-4-sam/
πRepo github.com/jovanavidenovic/DAM4SAM
π₯9π4π€―1
π₯π₯ It's time to decide whether you want to give LinkedIn your data for AI training or not π₯π₯
Poll: https://lnkd.in/p/ddnenZgH
Set here: https://linkedin.com/mypreferences/d/settings/data-for-ai-improvement
Poll: https://lnkd.in/p/ddnenZgH
Set here: https://linkedin.com/mypreferences/d/settings/data-for-ai-improvement
β€8π₯1π1
This media is not supported in your browser
VIEW IN TELEGRAM
π³ Invariant Saliency Detection π³
πSI-SOD: invariant salient object detection in scenarios when multiple salient objects of significantly different sizes appear within a single image. Repo releasedπ
πReview https://lnkd.in/p/dZBfbSsf
πPaper https://arxiv.org/pdf/2509.15573
πProject https://ferry-li.github.io/SI_SOD/
πRepo https://github.com/Ferry-Li/SI-SOD
πSI-SOD: invariant salient object detection in scenarios when multiple salient objects of significantly different sizes appear within a single image. Repo releasedπ
πReview https://lnkd.in/p/dZBfbSsf
πPaper https://arxiv.org/pdf/2509.15573
πProject https://ferry-li.github.io/SI_SOD/
πRepo https://github.com/Ferry-Li/SI-SOD
π₯3β€1