AI with Papers - Artificial Intelligence & Deep Learning
15.6K subscribers
145 photos
259 videos
14 files
1.35K links
All the AI with papers. Every day fresh updates about #DeepLearning #MachineLearning #LLM & #ComputerVision

Curated by Alessandro Ferrari | https://www.linkedin.com/in/visionarynet/

#AI #chatGPT
Download Telegram
This media is not supported in your browser
VIEW IN TELEGRAM
๐Ÿ”ฅ SAM 3/3D are OUT!! ๐Ÿ”ฅ

๐Ÿ‘‰#META released SAM 3, a unified model for detection, segmentation, tracking of objects in images & video using text, exemplar & visual prompts. Repo/Models under proprietary license๐Ÿ’™

๐Ÿ‘‰Review https://t.ly/lnRZN
๐Ÿ‘‰Paper https://t.ly/5tq9N
๐Ÿ‘‰Project https://ai.meta.com/sam3/
๐Ÿ‘‰Demo: https://segment-anything.com
๐Ÿ‘‰Repo https://github.com/facebookresearch/sam3
๐Ÿ”ฅ23โค7๐Ÿ‘1
This media is not supported in your browser
VIEW IN TELEGRAM
๐ŸฏUnwrapping of 3D Meshes๐Ÿฏ

๐Ÿ‘‰PartUV is a novel part-based UV unwrapping method for 3D meshes; it combines learned part priors with geometric cues to generate a compact set of part-aligned charts. Repo released๐Ÿ’™

๐Ÿ‘‰Review https://t.ly/8dNIY
๐Ÿ‘‰Paper arxiv.org/pdf/2511.16659
๐Ÿ‘‰Project www.zhaoningwang.com/PartUV/
๐Ÿ‘‰Repo github.com/EricWang12/PartUV
โค15๐Ÿ”ฅ3๐Ÿ‘2
๐Ÿ• Upsample Anything ๐Ÿ•

๐Ÿ‘‰Upsample Anything, a novel universal, training-free up-sampler via lightweight test-time optimization. No code but it's a relevant paper๐Ÿ’™

๐Ÿ‘‰Review https://t.ly/7LE6G
๐Ÿ‘‰Paper https://lnkd.in/dsUfdtih
๐Ÿ”ฅ8โค4๐Ÿ‘2๐Ÿ‘1
This media is not supported in your browser
VIEW IN TELEGRAM
๐ŸฆžSingle Synthetic Image per Class๐Ÿฆž

๐Ÿ‘‰MIT unveils Linear Gradient Matching (H/T Torralba), a novel method of distillation to use a single synthetic image per class for linear classifiers training (and more). Repo available๐Ÿ’™

๐Ÿ‘‰Review https://t.ly/dD3un
๐Ÿ‘‰Paper arxiv.org/pdf/2511.16674
๐Ÿ‘‰Project linear-gradient-matching.github.io/
๐Ÿ‘‰Repo github.com/GeorgeCazenavette/linear-gradient-matching
1โค6๐Ÿ”ฅ2๐Ÿ‘1๐Ÿ˜1
This media is not supported in your browser
VIEW IN TELEGRAM
๐Ÿงช EfficientSAM3 is out ๐Ÿงช

๐Ÿ‘‰Bristol announces EfficientSAM3, a family of efficient models built on Progressive Hierarchical Distillation that transfers capability from SAM3 to lightweight students. Code coming (in sync with SAM3 release)๐Ÿ’™

๐Ÿ‘‰Review https://t.ly/bfXP2
๐Ÿ‘‰Paper arxiv.org/pdf/2511.15833
๐Ÿ‘‰Project simonzeng7108.github.io/efficientsam3/
๐Ÿ‘‰Repo github.com/SimonZeng7108/efficientsam3
โค6๐Ÿ‘2๐Ÿ”ฅ1๐Ÿ‘1๐Ÿคฉ1
This media is not supported in your browser
VIEW IN TELEGRAM
๐ŸŒฉ๏ธ Cloud4D in time ๐ŸŒฉ๏ธ

๐Ÿ‘‰Cloud4D: physically-realistic 3D cloud fields using ground-based cameras at a 25 m spatial resolution and 5 s temporal resolution. Repo coming, Data released๐Ÿ’™

๐Ÿ‘‰Review https://t.ly/w7Zly
๐Ÿ‘‰Paper arxiv.org/pdf/2511.19431
๐Ÿ‘‰Project cloud4d.jacob-lin.com/
๐Ÿ‘‰Data https://drive.google.com/drive/folders/1QU_0kIUXIVt8h3uqygBeaF3Gvr_L5SdX?usp=drive_link
๐Ÿ‘‰Repo TBA
๐Ÿ”ฅ9โค1
This media is not supported in your browser
VIEW IN TELEGRAM
๐Ÿ“MotionV2V: Editing Motion in Video๐Ÿ“

๐Ÿ‘‰ Google unveils motion edits, a new approach for editing videos by controlling the change in motion from the original to the edited video using diffusion models. Impressive results. Repo released soon๐Ÿ’™

๐Ÿ‘‰Review https://t.ly/s0sIT
๐Ÿ‘‰Paper https://arxiv.org/pdf/2511.20640
๐Ÿ‘‰Project https://ryanndagreat.github.io/MotionV2V/
๐Ÿ‘‰Repo https://github.com/RyannDaGreat/MotionV2V
โค7๐Ÿ”ฅ1
This media is not supported in your browser
VIEW IN TELEGRAM
๐Ÿ”ฅ Smell Like Vision Spirit ๐Ÿ”ฅ

๐Ÿ‘‰New York Smells is a novel large-scale dataset of paired vision and olfaction captured in-the-wild, enabling the new task of cross-modal learning between smell and sight. With the lights out, it's less dangerous. Dataset available๐Ÿ’™

๐Ÿ‘‰Review https://t.ly/Ycn_B
๐Ÿ‘‰Paper arxiv.org/pdf/2511.20544
๐Ÿ‘‰Project smell.cs.columbia.edu/
โค12๐Ÿ”ฅ2๐Ÿ‘1
This media is not supported in your browser
VIEW IN TELEGRAM
๐Ÿ•ถ๏ธ Seeing without Pixels ๐Ÿ•ถ๏ธ

๐Ÿ‘‰Is it possible to perceive a videoโ€™s content without seeing its pixels, just from the camera trajectory? Deepmind (+ UTexas) is the first to systematically investigate this seemingly implausible question๐Ÿ’™

๐Ÿ‘‰Review https://t.ly/Ymd1c
๐Ÿ‘‰Paper arxiv.org/pdf/2511.21681
๐Ÿ‘‰Project sites.google.com/view/seeing-without-pixels
๐Ÿ”ฅ7โค5๐Ÿ‘1
This media is not supported in your browser
VIEW IN TELEGRAM
๐ŸŒตInstance-Level Video Generation๐ŸŒต

๐Ÿ‘‰InstanceV is the first video generation framework to be designed specifically for instance-level control at the architectural level. Code & Data announced๐Ÿ’™

๐Ÿ‘‰Review https://t.ly/y_TBT
๐Ÿ‘‰Paper arxiv.org/pdf/2511.23146
๐Ÿ‘‰Project aliothchen.github.io/projects/InstanceV/
๐Ÿ‘‰Repo TBA
โค8๐Ÿ‘4
This media is not supported in your browser
VIEW IN TELEGRAM
๐Ÿฅญ3D Point Motion Editing๐Ÿฅญ

๐Ÿ‘‰Edit-by-Track enables precise video motion editing via 3D point tracks. By specifying desired 3D trajectories, users can seamlessly control joint camera and object motion, remove objects, and transfer motion between videos. No code announced but relevant๐Ÿ’™

๐Ÿ‘‰Review https://t.ly/GJHJ5
๐Ÿ‘‰Paper arxiv.org/pdf/2512.02015
๐Ÿ‘‰Project edit-by-track.github.io/
๐Ÿ”ฅ4โค2๐Ÿคฃ1
This media is not supported in your browser
VIEW IN TELEGRAM
๐Ÿฆ„ Native Unified Multimodal ๐Ÿฆ„

๐Ÿ‘‰META unveils a novel UMM that builds a unified continuous visual representation by cascading a VAE encoder with a representation encoder. This unified representation space allows SOTA E2E processing of images/videos for both understanding/generation. Code under legal review๐Ÿ’™

๐Ÿ‘‰Review https://t.ly/7wmKP
๐Ÿ‘‰Paper https://lnkd.in/djT4WGEU
๐Ÿ‘‰Project https://tuna-ai.org/
๐Ÿ‘‰Repo github.com/wren93/tuna
โค5๐Ÿ”ฅ1
This media is not supported in your browser
VIEW IN TELEGRAM
โœŒ๏ธSOTA Generative SLPโœŒ๏ธ

๐Ÿ‘‰Stable Signer is a new sign language generative model. It redefines the SLP task as a hierarchical generation end-to-end task that only includes text understanding (Prompt2Gloss, Text2Gloss) and Pose2Vid. Repo with data ๐Ÿ’™

๐Ÿ‘‰Review https://t.ly/yKZhn
๐Ÿ‘‰Paper arxiv.org/pdf/2512.04048
๐Ÿ‘‰Project stablesigner.github.io/
๐Ÿ‘‰Data github.com/SignLLM/Prompt2Sign/tree/main/tools-new-2025
โค4๐Ÿ”ฅ1๐Ÿ‘1
This media is not supported in your browser
VIEW IN TELEGRAM
๐Ÿ˜TTSC for 3D Generative๐Ÿ˜

๐Ÿ‘‰SpaceControl is the new SOTA training-free test-time method for explicit spatial control of 3D generation. Repo announced๐Ÿ’™

๐Ÿ‘‰Review https://t.ly/1zrah
๐Ÿ‘‰Paper https://lnkd.in/dEWh3vep
๐Ÿ‘‰Project https://lnkd.in/dScftUmm
๐Ÿ‘‰Repo TBA
โค7๐Ÿ”ฅ2๐Ÿ‘1๐Ÿ‘1
This media is not supported in your browser
VIEW IN TELEGRAM
๐ŸŽทLayered PSD Diffusion๐ŸŽท

๐Ÿ‘‰OmniPSD produces layered PSD files with transparent alpha channels, separating text, foreground elements, and background into clean RGBA layers that can be directly edited in tools. Online Demo๐Ÿ’™

๐Ÿ‘‰Review https://t.ly/YNRAC
๐Ÿ‘‰Paper arxiv.org/pdf/2512.09247
๐Ÿ‘‰Project showlab.github.io/OmniPSD/
๐Ÿ‘‰Demo https://www.lovart.ai/it
๐Ÿ”ฅ9โค6๐Ÿ‘2
This media is not supported in your browser
VIEW IN TELEGRAM
๐ŸงฑPixel Art Volumetric Rendering๐Ÿงฑ

๐Ÿ‘‰Voxify3D is a novel differentiable two-stage framework bridging 3D mesh optimization with 2D pixel art supervision. Repo announced๐Ÿ’™

๐Ÿ‘‰Review https://t.ly/qPyNl
๐Ÿ‘‰Paper https://lnkd.in/du5ikJGN
๐Ÿ‘‰Project https://lnkd.in/dpiAjj5m
๐Ÿ‘‰Repo TBA
โค5๐Ÿ”ฅ4
This media is not supported in your browser
VIEW IN TELEGRAM
๐ŸซŽ MoCapAnything is out ๐ŸซŽ

๐Ÿ‘‰MoCapAnything is novel a reference-guided, factorized framework that first predicts 3D joint trajectories and then recovers asset-specific rotations via constraint-aware IK fitting. No code announced ๐Ÿฅฒ

๐Ÿ‘‰Review https://t.ly/_Tw6t
๐Ÿ‘‰Paper arxiv.org/pdf/2512.10881
๐Ÿ‘‰Project animotionlab.github.io/MoCapAnything
โค10๐Ÿ‘4๐Ÿ”ฅ4๐Ÿ‘1๐Ÿคฏ1๐Ÿ˜ข1
This media is not supported in your browser
VIEW IN TELEGRAM
๐Ÿ’š MatAnyone 2 is out! ๐Ÿ’š

๐Ÿ‘‰MatAnyone 2 is the most advanced human video matting framework that preserves fine details by avoiding segmentation-like boundaries, while also shows enhanced robustness under challenging real-world conditions. Repo & Dataset announced๐Ÿ’™

๐Ÿ‘‰Review https://t.ly/vxOBO
๐Ÿ‘‰Paper arxiv.org/pdf/2512.11782
๐Ÿ‘‰Project pq-yang.github.io/projects/MatAnyone2
๐Ÿ‘‰Repo github.com/pq-yang/MatAnyone2
๐Ÿ”ฅ5โค3๐Ÿ‘1๐Ÿ‘1
This media is not supported in your browser
VIEW IN TELEGRAM
๐Ÿ’ท SOTA Zero-Shot Stereo Matching๐Ÿ’ท

๐Ÿ‘‰Fast-FoundationStereo by #Nvidia is a novel family of architectures that achieve, for the first time, strong zero-shot generalization at real-time frame rate via divide-&-conquer acceleration. Code & Data announced๐Ÿ’™

๐Ÿ‘‰Review https://t.ly/XD6pO
๐Ÿ‘‰Paper https://lnkd.in/d9_YKW2A
๐Ÿ‘‰Project https://lnkd.in/dKDxm7EX
๐Ÿ‘‰Repo https://lnkd.in/dR4-PdsW
2๐Ÿ”ฅ9โค2
This media is not supported in your browser
VIEW IN TELEGRAM
๐Ÿ‘€DriverGaze360: Driver SOTA๐Ÿ‘€

๐Ÿ‘‰DriverGaze360 is a large-scale 360โ—ฆ field of view driver attention dataset, containing โˆผ1M gaze-labeled frames. Code & Dataset announced๐Ÿ’™

๐Ÿ‘‰Review https://t.ly/ZcoUw
๐Ÿ‘‰Paper arxiv.org/pdf/2512.14266
๐Ÿ‘‰Project av.dfki.de/drivergaze360/
๐Ÿ‘‰Repo github.com/dfki-av/drivergaze360
๐Ÿ‘‰Data av.dfki.de/drivergaze360/dataset
๐Ÿ”ฅ9โค2