AI with Papers - Artificial Intelligence & Deep Learning
15.5K subscribers
145 photos
256 videos
14 files
1.35K links
All the AI with papers. Every day fresh updates about #DeepLearning, #MachineLearning, LLMs and #ComputerVision

Curated by Alessandro Ferrari | https://www.linkedin.com/in/visionarynet/

#artificialintelligence #machinelearning #ml #AI
Download Telegram
This media is not supported in your browser
VIEW IN TELEGRAM
⌚ Multi-Shot Video Segmentation ⌚

πŸ‘‰Fudan focuses on an underexplored task of multi-shot video object segmentation (MVOS). Benchmark and repo available (the extension part of SAM) under Apache 2.0πŸ’™

πŸ‘‰Review https://t.ly/WBW00
πŸ‘‰Paper https://arxiv.org/pdf/2511.13715
πŸ‘‰Project https://henghuiding.com/SAAS/
πŸ‘‰Repo https://github.com/FudanCVL/SAAS
1πŸ”₯6❀2
This media is not supported in your browser
VIEW IN TELEGRAM
πŸ”₯ SAM 3/3D are OUT!! πŸ”₯

πŸ‘‰#META released SAM 3, a unified model for detection, segmentation, tracking of objects in images & video using text, exemplar & visual prompts. Repo/Models under proprietary licenseπŸ’™

πŸ‘‰Review https://t.ly/lnRZN
πŸ‘‰Paper https://t.ly/5tq9N
πŸ‘‰Project https://ai.meta.com/sam3/
πŸ‘‰Demo: https://segment-anything.com
πŸ‘‰Repo https://github.com/facebookresearch/sam3
πŸ”₯23❀7πŸ‘1
This media is not supported in your browser
VIEW IN TELEGRAM
🍯Unwrapping of 3D Meshes🍯

πŸ‘‰PartUV is a novel part-based UV unwrapping method for 3D meshes; it combines learned part priors with geometric cues to generate a compact set of part-aligned charts. Repo releasedπŸ’™

πŸ‘‰Review https://t.ly/8dNIY
πŸ‘‰Paper arxiv.org/pdf/2511.16659
πŸ‘‰Project www.zhaoningwang.com/PartUV/
πŸ‘‰Repo github.com/EricWang12/PartUV
❀15πŸ‘2πŸ”₯2
πŸ• Upsample Anything πŸ•

πŸ‘‰Upsample Anything, a novel universal, training-free up-sampler via lightweight test-time optimization. No code but it's a relevant paperπŸ’™

πŸ‘‰Review https://t.ly/7LE6G
πŸ‘‰Paper https://lnkd.in/dsUfdtih
πŸ”₯8❀4πŸ‘2πŸ‘1
This media is not supported in your browser
VIEW IN TELEGRAM
🦞Single Synthetic Image per Class🦞

πŸ‘‰MIT unveils Linear Gradient Matching (H/T Torralba), a novel method of distillation to use a single synthetic image per class for linear classifiers training (and more). Repo availableπŸ’™

πŸ‘‰Review https://t.ly/dD3un
πŸ‘‰Paper arxiv.org/pdf/2511.16674
πŸ‘‰Project linear-gradient-matching.github.io/
πŸ‘‰Repo github.com/GeorgeCazenavette/linear-gradient-matching
1❀6πŸ”₯2πŸ‘1😍1
This media is not supported in your browser
VIEW IN TELEGRAM
πŸ§ͺ EfficientSAM3 is out πŸ§ͺ

πŸ‘‰Bristol announces EfficientSAM3, a family of efficient models built on Progressive Hierarchical Distillation that transfers capability from SAM3 to lightweight students. Code coming (in sync with SAM3 release)πŸ’™

πŸ‘‰Review https://t.ly/bfXP2
πŸ‘‰Paper arxiv.org/pdf/2511.15833
πŸ‘‰Project simonzeng7108.github.io/efficientsam3/
πŸ‘‰Repo github.com/SimonZeng7108/efficientsam3
❀6πŸ‘2πŸ”₯1πŸ‘1🀩1
This media is not supported in your browser
VIEW IN TELEGRAM
🌩️ Cloud4D in time 🌩️

πŸ‘‰Cloud4D: physically-realistic 3D cloud fields using ground-based cameras at a 25 m spatial resolution and 5 s temporal resolution. Repo coming, Data releasedπŸ’™

πŸ‘‰Review https://t.ly/w7Zly
πŸ‘‰Paper arxiv.org/pdf/2511.19431
πŸ‘‰Project cloud4d.jacob-lin.com/
πŸ‘‰Data https://drive.google.com/drive/folders/1QU_0kIUXIVt8h3uqygBeaF3Gvr_L5SdX?usp=drive_link
πŸ‘‰Repo TBA
πŸ”₯9❀1
This media is not supported in your browser
VIEW IN TELEGRAM
πŸ“MotionV2V: Editing Motion in VideoπŸ“

πŸ‘‰ Google unveils motion edits, a new approach for editing videos by controlling the change in motion from the original to the edited video using diffusion models. Impressive results. Repo released soonπŸ’™

πŸ‘‰Review https://t.ly/s0sIT
πŸ‘‰Paper https://arxiv.org/pdf/2511.20640
πŸ‘‰Project https://ryanndagreat.github.io/MotionV2V/
πŸ‘‰Repo https://github.com/RyannDaGreat/MotionV2V
❀7πŸ”₯1
This media is not supported in your browser
VIEW IN TELEGRAM
πŸ”₯ Smell Like Vision Spirit πŸ”₯

πŸ‘‰New York Smells is a novel large-scale dataset of paired vision and olfaction captured in-the-wild, enabling the new task of cross-modal learning between smell and sight. With the lights out, it's less dangerous. Dataset availableπŸ’™

πŸ‘‰Review https://t.ly/Ycn_B
πŸ‘‰Paper arxiv.org/pdf/2511.20544
πŸ‘‰Project smell.cs.columbia.edu/
❀12πŸ”₯2πŸ‘1
This media is not supported in your browser
VIEW IN TELEGRAM
πŸ•ΆοΈ Seeing without Pixels πŸ•ΆοΈ

πŸ‘‰Is it possible to perceive a video’s content without seeing its pixels, just from the camera trajectory? Deepmind (+ UTexas) is the first to systematically investigate this seemingly implausible questionπŸ’™

πŸ‘‰Review https://t.ly/Ymd1c
πŸ‘‰Paper arxiv.org/pdf/2511.21681
πŸ‘‰Project sites.google.com/view/seeing-without-pixels
πŸ”₯7❀4πŸ‘1
This media is not supported in your browser
VIEW IN TELEGRAM
🌡Instance-Level Video Generation🌡

πŸ‘‰InstanceV is the first video generation framework to be designed specifically for instance-level control at the architectural level. Code & Data announcedπŸ’™

πŸ‘‰Review https://t.ly/y_TBT
πŸ‘‰Paper arxiv.org/pdf/2511.23146
πŸ‘‰Project aliothchen.github.io/projects/InstanceV/
πŸ‘‰Repo TBA
❀7πŸ‘4
This media is not supported in your browser
VIEW IN TELEGRAM
πŸ₯­3D Point Motion EditingπŸ₯­

πŸ‘‰Edit-by-Track enables precise video motion editing via 3D point tracks. By specifying desired 3D trajectories, users can seamlessly control joint camera and object motion, remove objects, and transfer motion between videos. No code announced but relevantπŸ’™

πŸ‘‰Review https://t.ly/GJHJ5
πŸ‘‰Paper arxiv.org/pdf/2512.02015
πŸ‘‰Project edit-by-track.github.io/
πŸ”₯3❀1🀣1
This media is not supported in your browser
VIEW IN TELEGRAM
πŸ¦„ Native Unified Multimodal πŸ¦„

πŸ‘‰META unveils a novel UMM that builds a unified continuous visual representation by cascading a VAE encoder with a representation encoder. This unified representation space allows SOTA E2E processing of images/videos for both understanding/generation. Code under legal reviewπŸ’™

πŸ‘‰Review https://t.ly/7wmKP
πŸ‘‰Paper https://lnkd.in/djT4WGEU
πŸ‘‰Project https://tuna-ai.org/
πŸ‘‰Repo github.com/wren93/tuna
❀2πŸ”₯1
This media is not supported in your browser
VIEW IN TELEGRAM
✌️SOTA Generative SLP✌️

πŸ‘‰Stable Signer is a new sign language generative model. It redefines the SLP task as a hierarchical generation end-to-end task that only includes text understanding (Prompt2Gloss, Text2Gloss) and Pose2Vid. Repo with data πŸ’™

πŸ‘‰Review https://t.ly/yKZhn
πŸ‘‰Paper arxiv.org/pdf/2512.04048
πŸ‘‰Project stablesigner.github.io/
πŸ‘‰Data github.com/SignLLM/Prompt2Sign/tree/main/tools-new-2025
❀1πŸ”₯1πŸ‘1