This media is not supported in your browser
VIEW IN TELEGRAM
β Multi-Shot Video Segmentation β
πFudan focuses on an underexplored task of multi-shot video object segmentation (MVOS). Benchmark and repo available (the extension part of SAM) under Apache 2.0π
πReview https://t.ly/WBW00
πPaper https://arxiv.org/pdf/2511.13715
πProject https://henghuiding.com/SAAS/
πRepo https://github.com/FudanCVL/SAAS
πFudan focuses on an underexplored task of multi-shot video object segmentation (MVOS). Benchmark and repo available (the extension part of SAM) under Apache 2.0π
πReview https://t.ly/WBW00
πPaper https://arxiv.org/pdf/2511.13715
πProject https://henghuiding.com/SAAS/
πRepo https://github.com/FudanCVL/SAAS
1π₯6β€2
This media is not supported in your browser
VIEW IN TELEGRAM
π₯ SAM 3/3D are OUT!! π₯
π#META released SAM 3, a unified model for detection, segmentation, tracking of objects in images & video using text, exemplar & visual prompts. Repo/Models under proprietary licenseπ
πReview https://t.ly/lnRZN
πPaper https://t.ly/5tq9N
πProject https://ai.meta.com/sam3/
πDemo: https://segment-anything.com
πRepo https://github.com/facebookresearch/sam3
π#META released SAM 3, a unified model for detection, segmentation, tracking of objects in images & video using text, exemplar & visual prompts. Repo/Models under proprietary licenseπ
πReview https://t.ly/lnRZN
πPaper https://t.ly/5tq9N
πProject https://ai.meta.com/sam3/
πDemo: https://segment-anything.com
πRepo https://github.com/facebookresearch/sam3
π₯23β€7π1
This media is not supported in your browser
VIEW IN TELEGRAM
π―Unwrapping of 3D Meshesπ―
πPartUV is a novel part-based UV unwrapping method for 3D meshes; it combines learned part priors with geometric cues to generate a compact set of part-aligned charts. Repo releasedπ
πReview https://t.ly/8dNIY
πPaper arxiv.org/pdf/2511.16659
πProject www.zhaoningwang.com/PartUV/
πRepo github.com/EricWang12/PartUV
πPartUV is a novel part-based UV unwrapping method for 3D meshes; it combines learned part priors with geometric cues to generate a compact set of part-aligned charts. Repo releasedπ
πReview https://t.ly/8dNIY
πPaper arxiv.org/pdf/2511.16659
πProject www.zhaoningwang.com/PartUV/
πRepo github.com/EricWang12/PartUV
β€15π2π₯2
π Upsample Anything π
πUpsample Anything, a novel universal, training-free up-sampler via lightweight test-time optimization. No code but it's a relevant paperπ
πReview https://t.ly/7LE6G
πPaper https://lnkd.in/dsUfdtih
πUpsample Anything, a novel universal, training-free up-sampler via lightweight test-time optimization. No code but it's a relevant paperπ
πReview https://t.ly/7LE6G
πPaper https://lnkd.in/dsUfdtih
π₯8β€4π2π1
This media is not supported in your browser
VIEW IN TELEGRAM
π¦Single Synthetic Image per Classπ¦
πMIT unveils Linear Gradient Matching (H/T Torralba), a novel method of distillation to use a single synthetic image per class for linear classifiers training (and more). Repo availableπ
πReview https://t.ly/dD3un
πPaper arxiv.org/pdf/2511.16674
πProject linear-gradient-matching.github.io/
πRepo github.com/GeorgeCazenavette/linear-gradient-matching
πMIT unveils Linear Gradient Matching (H/T Torralba), a novel method of distillation to use a single synthetic image per class for linear classifiers training (and more). Repo availableπ
πReview https://t.ly/dD3un
πPaper arxiv.org/pdf/2511.16674
πProject linear-gradient-matching.github.io/
πRepo github.com/GeorgeCazenavette/linear-gradient-matching
1β€6π₯2π1π1
This media is not supported in your browser
VIEW IN TELEGRAM
π§ͺ EfficientSAM3 is out π§ͺ
πBristol announces EfficientSAM3, a family of efficient models built on Progressive Hierarchical Distillation that transfers capability from SAM3 to lightweight students. Code coming (in sync with SAM3 release)π
πReview https://t.ly/bfXP2
πPaper arxiv.org/pdf/2511.15833
πProject simonzeng7108.github.io/efficientsam3/
πRepo github.com/SimonZeng7108/efficientsam3
πBristol announces EfficientSAM3, a family of efficient models built on Progressive Hierarchical Distillation that transfers capability from SAM3 to lightweight students. Code coming (in sync with SAM3 release)π
πReview https://t.ly/bfXP2
πPaper arxiv.org/pdf/2511.15833
πProject simonzeng7108.github.io/efficientsam3/
πRepo github.com/SimonZeng7108/efficientsam3
β€6π2π₯1π1π€©1
This media is not supported in your browser
VIEW IN TELEGRAM
π©οΈ Cloud4D in time π©οΈ
πCloud4D: physically-realistic 3D cloud fields using ground-based cameras at a 25 m spatial resolution and 5 s temporal resolution. Repo coming, Data releasedπ
πReview https://t.ly/w7Zly
πPaper arxiv.org/pdf/2511.19431
πProject cloud4d.jacob-lin.com/
πData https://drive.google.com/drive/folders/1QU_0kIUXIVt8h3uqygBeaF3Gvr_L5SdX?usp=drive_link
πRepo TBA
πCloud4D: physically-realistic 3D cloud fields using ground-based cameras at a 25 m spatial resolution and 5 s temporal resolution. Repo coming, Data releasedπ
πReview https://t.ly/w7Zly
πPaper arxiv.org/pdf/2511.19431
πProject cloud4d.jacob-lin.com/
πData https://drive.google.com/drive/folders/1QU_0kIUXIVt8h3uqygBeaF3Gvr_L5SdX?usp=drive_link
πRepo TBA
π₯9β€1
This media is not supported in your browser
VIEW IN TELEGRAM
πMotionV2V: Editing Motion in Videoπ
π Google unveils motion edits, a new approach for editing videos by controlling the change in motion from the original to the edited video using diffusion models. Impressive results. Repo released soonπ
πReview https://t.ly/s0sIT
πPaper https://arxiv.org/pdf/2511.20640
πProject https://ryanndagreat.github.io/MotionV2V/
πRepo https://github.com/RyannDaGreat/MotionV2V
π Google unveils motion edits, a new approach for editing videos by controlling the change in motion from the original to the edited video using diffusion models. Impressive results. Repo released soonπ
πReview https://t.ly/s0sIT
πPaper https://arxiv.org/pdf/2511.20640
πProject https://ryanndagreat.github.io/MotionV2V/
πRepo https://github.com/RyannDaGreat/MotionV2V
β€7π₯1
This media is not supported in your browser
VIEW IN TELEGRAM
π₯ Smell Like Vision Spirit π₯
πNew York Smells is a novel large-scale dataset of paired vision and olfaction captured in-the-wild, enabling the new task of cross-modal learning between smell and sight. With the lights out, it's less dangerous. Dataset availableπ
πReview https://t.ly/Ycn_B
πPaper arxiv.org/pdf/2511.20544
πProject smell.cs.columbia.edu/
πNew York Smells is a novel large-scale dataset of paired vision and olfaction captured in-the-wild, enabling the new task of cross-modal learning between smell and sight. With the lights out, it's less dangerous. Dataset availableπ
πReview https://t.ly/Ycn_B
πPaper arxiv.org/pdf/2511.20544
πProject smell.cs.columbia.edu/
β€12π₯2π1
This media is not supported in your browser
VIEW IN TELEGRAM
πΆοΈ Seeing without Pixels πΆοΈ
πIs it possible to perceive a videoβs content without seeing its pixels, just from the camera trajectory? Deepmind (+ UTexas) is the first to systematically investigate this seemingly implausible questionπ
πReview https://t.ly/Ymd1c
πPaper arxiv.org/pdf/2511.21681
πProject sites.google.com/view/seeing-without-pixels
πIs it possible to perceive a videoβs content without seeing its pixels, just from the camera trajectory? Deepmind (+ UTexas) is the first to systematically investigate this seemingly implausible questionπ
πReview https://t.ly/Ymd1c
πPaper arxiv.org/pdf/2511.21681
πProject sites.google.com/view/seeing-without-pixels
π₯7β€4π1
This media is not supported in your browser
VIEW IN TELEGRAM
π΅Instance-Level Video Generationπ΅
πInstanceV is the first video generation framework to be designed specifically for instance-level control at the architectural level. Code & Data announcedπ
πReview https://t.ly/y_TBT
πPaper arxiv.org/pdf/2511.23146
πProject aliothchen.github.io/projects/InstanceV/
πRepo TBA
πInstanceV is the first video generation framework to be designed specifically for instance-level control at the architectural level. Code & Data announcedπ
πReview https://t.ly/y_TBT
πPaper arxiv.org/pdf/2511.23146
πProject aliothchen.github.io/projects/InstanceV/
πRepo TBA
β€7π4
This media is not supported in your browser
VIEW IN TELEGRAM
π₯3D Point Motion Editingπ₯
πEdit-by-Track enables precise video motion editing via 3D point tracks. By specifying desired 3D trajectories, users can seamlessly control joint camera and object motion, remove objects, and transfer motion between videos. No code announced but relevantπ
πReview https://t.ly/GJHJ5
πPaper arxiv.org/pdf/2512.02015
πProject edit-by-track.github.io/
πEdit-by-Track enables precise video motion editing via 3D point tracks. By specifying desired 3D trajectories, users can seamlessly control joint camera and object motion, remove objects, and transfer motion between videos. No code announced but relevantπ
πReview https://t.ly/GJHJ5
πPaper arxiv.org/pdf/2512.02015
πProject edit-by-track.github.io/
π₯3β€1π€£1
This media is not supported in your browser
VIEW IN TELEGRAM
π¦ Native Unified Multimodal π¦
πMETA unveils a novel UMM that builds a unified continuous visual representation by cascading a VAE encoder with a representation encoder. This unified representation space allows SOTA E2E processing of images/videos for both understanding/generation. Code under legal reviewπ
πReview https://t.ly/7wmKP
πPaper https://lnkd.in/djT4WGEU
πProject https://tuna-ai.org/
πRepo github.com/wren93/tuna
πMETA unveils a novel UMM that builds a unified continuous visual representation by cascading a VAE encoder with a representation encoder. This unified representation space allows SOTA E2E processing of images/videos for both understanding/generation. Code under legal reviewπ
πReview https://t.ly/7wmKP
πPaper https://lnkd.in/djT4WGEU
πProject https://tuna-ai.org/
πRepo github.com/wren93/tuna
β€2π₯1
This media is not supported in your browser
VIEW IN TELEGRAM
βοΈSOTA Generative SLPβοΈ
πStable Signer is a new sign language generative model. It redefines the SLP task as a hierarchical generation end-to-end task that only includes text understanding (Prompt2Gloss, Text2Gloss) and Pose2Vid. Repo with data π
πReview https://t.ly/yKZhn
πPaper arxiv.org/pdf/2512.04048
πProject stablesigner.github.io/
πData github.com/SignLLM/Prompt2Sign/tree/main/tools-new-2025
πStable Signer is a new sign language generative model. It redefines the SLP task as a hierarchical generation end-to-end task that only includes text understanding (Prompt2Gloss, Text2Gloss) and Pose2Vid. Repo with data π
πReview https://t.ly/yKZhn
πPaper arxiv.org/pdf/2512.04048
πProject stablesigner.github.io/
πData github.com/SignLLM/Prompt2Sign/tree/main/tools-new-2025
β€1π₯1π1