This media is not supported in your browser
VIEW IN TELEGRAM
๐ฅ SAM 3/3D are OUT!! ๐ฅ
๐#META released SAM 3, a unified model for detection, segmentation, tracking of objects in images & video using text, exemplar & visual prompts. Repo/Models under proprietary license๐
๐Review https://t.ly/lnRZN
๐Paper https://t.ly/5tq9N
๐Project https://ai.meta.com/sam3/
๐Demo: https://segment-anything.com
๐Repo https://github.com/facebookresearch/sam3
๐#META released SAM 3, a unified model for detection, segmentation, tracking of objects in images & video using text, exemplar & visual prompts. Repo/Models under proprietary license๐
๐Review https://t.ly/lnRZN
๐Paper https://t.ly/5tq9N
๐Project https://ai.meta.com/sam3/
๐Demo: https://segment-anything.com
๐Repo https://github.com/facebookresearch/sam3
๐ฅ23โค7๐1
This media is not supported in your browser
VIEW IN TELEGRAM
๐ฏUnwrapping of 3D Meshes๐ฏ
๐PartUV is a novel part-based UV unwrapping method for 3D meshes; it combines learned part priors with geometric cues to generate a compact set of part-aligned charts. Repo released๐
๐Review https://t.ly/8dNIY
๐Paper arxiv.org/pdf/2511.16659
๐Project www.zhaoningwang.com/PartUV/
๐Repo github.com/EricWang12/PartUV
๐PartUV is a novel part-based UV unwrapping method for 3D meshes; it combines learned part priors with geometric cues to generate a compact set of part-aligned charts. Repo released๐
๐Review https://t.ly/8dNIY
๐Paper arxiv.org/pdf/2511.16659
๐Project www.zhaoningwang.com/PartUV/
๐Repo github.com/EricWang12/PartUV
โค15๐ฅ3๐2
๐ Upsample Anything ๐
๐Upsample Anything, a novel universal, training-free up-sampler via lightweight test-time optimization. No code but it's a relevant paper๐
๐Review https://t.ly/7LE6G
๐Paper https://lnkd.in/dsUfdtih
๐Upsample Anything, a novel universal, training-free up-sampler via lightweight test-time optimization. No code but it's a relevant paper๐
๐Review https://t.ly/7LE6G
๐Paper https://lnkd.in/dsUfdtih
๐ฅ8โค4๐2๐1
This media is not supported in your browser
VIEW IN TELEGRAM
๐ฆSingle Synthetic Image per Class๐ฆ
๐MIT unveils Linear Gradient Matching (H/T Torralba), a novel method of distillation to use a single synthetic image per class for linear classifiers training (and more). Repo available๐
๐Review https://t.ly/dD3un
๐Paper arxiv.org/pdf/2511.16674
๐Project linear-gradient-matching.github.io/
๐Repo github.com/GeorgeCazenavette/linear-gradient-matching
๐MIT unveils Linear Gradient Matching (H/T Torralba), a novel method of distillation to use a single synthetic image per class for linear classifiers training (and more). Repo available๐
๐Review https://t.ly/dD3un
๐Paper arxiv.org/pdf/2511.16674
๐Project linear-gradient-matching.github.io/
๐Repo github.com/GeorgeCazenavette/linear-gradient-matching
1โค6๐ฅ2๐1๐1
This media is not supported in your browser
VIEW IN TELEGRAM
๐งช EfficientSAM3 is out ๐งช
๐Bristol announces EfficientSAM3, a family of efficient models built on Progressive Hierarchical Distillation that transfers capability from SAM3 to lightweight students. Code coming (in sync with SAM3 release)๐
๐Review https://t.ly/bfXP2
๐Paper arxiv.org/pdf/2511.15833
๐Project simonzeng7108.github.io/efficientsam3/
๐Repo github.com/SimonZeng7108/efficientsam3
๐Bristol announces EfficientSAM3, a family of efficient models built on Progressive Hierarchical Distillation that transfers capability from SAM3 to lightweight students. Code coming (in sync with SAM3 release)๐
๐Review https://t.ly/bfXP2
๐Paper arxiv.org/pdf/2511.15833
๐Project simonzeng7108.github.io/efficientsam3/
๐Repo github.com/SimonZeng7108/efficientsam3
โค6๐2๐ฅ1๐1๐คฉ1
This media is not supported in your browser
VIEW IN TELEGRAM
๐ฉ๏ธ Cloud4D in time ๐ฉ๏ธ
๐Cloud4D: physically-realistic 3D cloud fields using ground-based cameras at a 25 m spatial resolution and 5 s temporal resolution. Repo coming, Data released๐
๐Review https://t.ly/w7Zly
๐Paper arxiv.org/pdf/2511.19431
๐Project cloud4d.jacob-lin.com/
๐Data https://drive.google.com/drive/folders/1QU_0kIUXIVt8h3uqygBeaF3Gvr_L5SdX?usp=drive_link
๐Repo TBA
๐Cloud4D: physically-realistic 3D cloud fields using ground-based cameras at a 25 m spatial resolution and 5 s temporal resolution. Repo coming, Data released๐
๐Review https://t.ly/w7Zly
๐Paper arxiv.org/pdf/2511.19431
๐Project cloud4d.jacob-lin.com/
๐Data https://drive.google.com/drive/folders/1QU_0kIUXIVt8h3uqygBeaF3Gvr_L5SdX?usp=drive_link
๐Repo TBA
๐ฅ9โค1
This media is not supported in your browser
VIEW IN TELEGRAM
๐MotionV2V: Editing Motion in Video๐
๐ Google unveils motion edits, a new approach for editing videos by controlling the change in motion from the original to the edited video using diffusion models. Impressive results. Repo released soon๐
๐Review https://t.ly/s0sIT
๐Paper https://arxiv.org/pdf/2511.20640
๐Project https://ryanndagreat.github.io/MotionV2V/
๐Repo https://github.com/RyannDaGreat/MotionV2V
๐ Google unveils motion edits, a new approach for editing videos by controlling the change in motion from the original to the edited video using diffusion models. Impressive results. Repo released soon๐
๐Review https://t.ly/s0sIT
๐Paper https://arxiv.org/pdf/2511.20640
๐Project https://ryanndagreat.github.io/MotionV2V/
๐Repo https://github.com/RyannDaGreat/MotionV2V
โค7๐ฅ1
This media is not supported in your browser
VIEW IN TELEGRAM
๐ฅ Smell Like Vision Spirit ๐ฅ
๐New York Smells is a novel large-scale dataset of paired vision and olfaction captured in-the-wild, enabling the new task of cross-modal learning between smell and sight. With the lights out, it's less dangerous. Dataset available๐
๐Review https://t.ly/Ycn_B
๐Paper arxiv.org/pdf/2511.20544
๐Project smell.cs.columbia.edu/
๐New York Smells is a novel large-scale dataset of paired vision and olfaction captured in-the-wild, enabling the new task of cross-modal learning between smell and sight. With the lights out, it's less dangerous. Dataset available๐
๐Review https://t.ly/Ycn_B
๐Paper arxiv.org/pdf/2511.20544
๐Project smell.cs.columbia.edu/
โค12๐ฅ2๐1
This media is not supported in your browser
VIEW IN TELEGRAM
๐ถ๏ธ Seeing without Pixels ๐ถ๏ธ
๐Is it possible to perceive a videoโs content without seeing its pixels, just from the camera trajectory? Deepmind (+ UTexas) is the first to systematically investigate this seemingly implausible question๐
๐Review https://t.ly/Ymd1c
๐Paper arxiv.org/pdf/2511.21681
๐Project sites.google.com/view/seeing-without-pixels
๐Is it possible to perceive a videoโs content without seeing its pixels, just from the camera trajectory? Deepmind (+ UTexas) is the first to systematically investigate this seemingly implausible question๐
๐Review https://t.ly/Ymd1c
๐Paper arxiv.org/pdf/2511.21681
๐Project sites.google.com/view/seeing-without-pixels
๐ฅ7โค5๐1
This media is not supported in your browser
VIEW IN TELEGRAM
๐ตInstance-Level Video Generation๐ต
๐InstanceV is the first video generation framework to be designed specifically for instance-level control at the architectural level. Code & Data announced๐
๐Review https://t.ly/y_TBT
๐Paper arxiv.org/pdf/2511.23146
๐Project aliothchen.github.io/projects/InstanceV/
๐Repo TBA
๐InstanceV is the first video generation framework to be designed specifically for instance-level control at the architectural level. Code & Data announced๐
๐Review https://t.ly/y_TBT
๐Paper arxiv.org/pdf/2511.23146
๐Project aliothchen.github.io/projects/InstanceV/
๐Repo TBA
โค8๐4
This media is not supported in your browser
VIEW IN TELEGRAM
๐ฅญ3D Point Motion Editing๐ฅญ
๐Edit-by-Track enables precise video motion editing via 3D point tracks. By specifying desired 3D trajectories, users can seamlessly control joint camera and object motion, remove objects, and transfer motion between videos. No code announced but relevant๐
๐Review https://t.ly/GJHJ5
๐Paper arxiv.org/pdf/2512.02015
๐Project edit-by-track.github.io/
๐Edit-by-Track enables precise video motion editing via 3D point tracks. By specifying desired 3D trajectories, users can seamlessly control joint camera and object motion, remove objects, and transfer motion between videos. No code announced but relevant๐
๐Review https://t.ly/GJHJ5
๐Paper arxiv.org/pdf/2512.02015
๐Project edit-by-track.github.io/
๐ฅ4โค2๐คฃ1
This media is not supported in your browser
VIEW IN TELEGRAM
๐ฆ Native Unified Multimodal ๐ฆ
๐META unveils a novel UMM that builds a unified continuous visual representation by cascading a VAE encoder with a representation encoder. This unified representation space allows SOTA E2E processing of images/videos for both understanding/generation. Code under legal review๐
๐Review https://t.ly/7wmKP
๐Paper https://lnkd.in/djT4WGEU
๐Project https://tuna-ai.org/
๐Repo github.com/wren93/tuna
๐META unveils a novel UMM that builds a unified continuous visual representation by cascading a VAE encoder with a representation encoder. This unified representation space allows SOTA E2E processing of images/videos for both understanding/generation. Code under legal review๐
๐Review https://t.ly/7wmKP
๐Paper https://lnkd.in/djT4WGEU
๐Project https://tuna-ai.org/
๐Repo github.com/wren93/tuna
โค5๐ฅ1
This media is not supported in your browser
VIEW IN TELEGRAM
โ๏ธSOTA Generative SLPโ๏ธ
๐Stable Signer is a new sign language generative model. It redefines the SLP task as a hierarchical generation end-to-end task that only includes text understanding (Prompt2Gloss, Text2Gloss) and Pose2Vid. Repo with data ๐
๐Review https://t.ly/yKZhn
๐Paper arxiv.org/pdf/2512.04048
๐Project stablesigner.github.io/
๐Data github.com/SignLLM/Prompt2Sign/tree/main/tools-new-2025
๐Stable Signer is a new sign language generative model. It redefines the SLP task as a hierarchical generation end-to-end task that only includes text understanding (Prompt2Gloss, Text2Gloss) and Pose2Vid. Repo with data ๐
๐Review https://t.ly/yKZhn
๐Paper arxiv.org/pdf/2512.04048
๐Project stablesigner.github.io/
๐Data github.com/SignLLM/Prompt2Sign/tree/main/tools-new-2025
โค4๐ฅ1๐1
This media is not supported in your browser
VIEW IN TELEGRAM
๐TTSC for 3D Generative๐
๐SpaceControl is the new SOTA training-free test-time method for explicit spatial control of 3D generation. Repo announced๐
๐Review https://t.ly/1zrah
๐Paper https://lnkd.in/dEWh3vep
๐Project https://lnkd.in/dScftUmm
๐Repo TBA
๐SpaceControl is the new SOTA training-free test-time method for explicit spatial control of 3D generation. Repo announced๐
๐Review https://t.ly/1zrah
๐Paper https://lnkd.in/dEWh3vep
๐Project https://lnkd.in/dScftUmm
๐Repo TBA
โค7๐ฅ2๐1๐1
This media is not supported in your browser
VIEW IN TELEGRAM
๐ทLayered PSD Diffusion๐ท
๐OmniPSD produces layered PSD files with transparent alpha channels, separating text, foreground elements, and background into clean RGBA layers that can be directly edited in tools. Online Demo๐
๐Review https://t.ly/YNRAC
๐Paper arxiv.org/pdf/2512.09247
๐Project showlab.github.io/OmniPSD/
๐Demo https://www.lovart.ai/it
๐OmniPSD produces layered PSD files with transparent alpha channels, separating text, foreground elements, and background into clean RGBA layers that can be directly edited in tools. Online Demo๐
๐Review https://t.ly/YNRAC
๐Paper arxiv.org/pdf/2512.09247
๐Project showlab.github.io/OmniPSD/
๐Demo https://www.lovart.ai/it
๐ฅ9โค6๐2
This media is not supported in your browser
VIEW IN TELEGRAM
๐งฑPixel Art Volumetric Rendering๐งฑ
๐Voxify3D is a novel differentiable two-stage framework bridging 3D mesh optimization with 2D pixel art supervision. Repo announced๐
๐Review https://t.ly/qPyNl
๐Paper https://lnkd.in/du5ikJGN
๐Project https://lnkd.in/dpiAjj5m
๐Repo TBA
๐Voxify3D is a novel differentiable two-stage framework bridging 3D mesh optimization with 2D pixel art supervision. Repo announced๐
๐Review https://t.ly/qPyNl
๐Paper https://lnkd.in/du5ikJGN
๐Project https://lnkd.in/dpiAjj5m
๐Repo TBA
โค5๐ฅ4
This media is not supported in your browser
VIEW IN TELEGRAM
๐ซ MoCapAnything is out ๐ซ
๐MoCapAnything is novel a reference-guided, factorized framework that first predicts 3D joint trajectories and then recovers asset-specific rotations via constraint-aware IK fitting. No code announced ๐ฅฒ
๐Review https://t.ly/_Tw6t
๐Paper arxiv.org/pdf/2512.10881
๐Project animotionlab.github.io/MoCapAnything
๐MoCapAnything is novel a reference-guided, factorized framework that first predicts 3D joint trajectories and then recovers asset-specific rotations via constraint-aware IK fitting. No code announced ๐ฅฒ
๐Review https://t.ly/_Tw6t
๐Paper arxiv.org/pdf/2512.10881
๐Project animotionlab.github.io/MoCapAnything
โค10๐4๐ฅ4๐1๐คฏ1๐ข1
This media is not supported in your browser
VIEW IN TELEGRAM
๐ MatAnyone 2 is out! ๐
๐MatAnyone 2 is the most advanced human video matting framework that preserves fine details by avoiding segmentation-like boundaries, while also shows enhanced robustness under challenging real-world conditions. Repo & Dataset announced๐
๐Review https://t.ly/vxOBO
๐Paper arxiv.org/pdf/2512.11782
๐Project pq-yang.github.io/projects/MatAnyone2
๐Repo github.com/pq-yang/MatAnyone2
๐MatAnyone 2 is the most advanced human video matting framework that preserves fine details by avoiding segmentation-like boundaries, while also shows enhanced robustness under challenging real-world conditions. Repo & Dataset announced๐
๐Review https://t.ly/vxOBO
๐Paper arxiv.org/pdf/2512.11782
๐Project pq-yang.github.io/projects/MatAnyone2
๐Repo github.com/pq-yang/MatAnyone2
๐ฅ5โค3๐1๐1
This media is not supported in your browser
VIEW IN TELEGRAM
๐ท SOTA Zero-Shot Stereo Matching๐ท
๐Fast-FoundationStereo by #Nvidia is a novel family of architectures that achieve, for the first time, strong zero-shot generalization at real-time frame rate via divide-&-conquer acceleration. Code & Data announced๐
๐Review https://t.ly/XD6pO
๐Paper https://lnkd.in/d9_YKW2A
๐Project https://lnkd.in/dKDxm7EX
๐Repo https://lnkd.in/dR4-PdsW
๐Fast-FoundationStereo by #Nvidia is a novel family of architectures that achieve, for the first time, strong zero-shot generalization at real-time frame rate via divide-&-conquer acceleration. Code & Data announced๐
๐Review https://t.ly/XD6pO
๐Paper https://lnkd.in/d9_YKW2A
๐Project https://lnkd.in/dKDxm7EX
๐Repo https://lnkd.in/dR4-PdsW
2๐ฅ9โค2
This media is not supported in your browser
VIEW IN TELEGRAM
๐DriverGaze360: Driver SOTA๐
๐DriverGaze360 is a large-scale 360โฆ field of view driver attention dataset, containing โผ1M gaze-labeled frames. Code & Dataset announced๐
๐Review https://t.ly/ZcoUw
๐Paper arxiv.org/pdf/2512.14266
๐Project av.dfki.de/drivergaze360/
๐Repo github.com/dfki-av/drivergaze360
๐Data av.dfki.de/drivergaze360/dataset
๐DriverGaze360 is a large-scale 360โฆ field of view driver attention dataset, containing โผ1M gaze-labeled frames. Code & Dataset announced๐
๐Review https://t.ly/ZcoUw
๐Paper arxiv.org/pdf/2512.14266
๐Project av.dfki.de/drivergaze360/
๐Repo github.com/dfki-av/drivergaze360
๐Data av.dfki.de/drivergaze360/dataset
๐ฅ9โค2