AI with Papers - Artificial Intelligence & Deep Learning
15.1K subscribers
135 photos
247 videos
13 files
1.31K links
All the AI with papers. Every day fresh updates about #DeepLearning, #MachineLearning, LLMs and #ComputerVision

Curated by Alessandro Ferrari | https://www.linkedin.com/in/visionarynet/

#artificialintelligence #machinelearning #ml #AI
Download Telegram
This media is not supported in your browser
VIEW IN TELEGRAM
🌳MSVA Zero-Shot Multi-View🌳

πŸ‘‰Niantic unveils MVSA, novel Multi-View Stereo Architecture to work anywhere by generalizing across diverse domains & depth ranges. Highly accurate & 3D-consistent depths. Code & models announcedπŸ’™

πŸ‘‰Review https://t.ly/LvuTh
πŸ‘‰Paper https://arxiv.org/pdf/2503.22430
πŸ‘‰Project https://nianticlabs.github.io/mvsanywhere/
πŸ‘‰Repo https://lnkd.in/ddQz9eps
πŸ”₯12πŸ‘2πŸ‘1
This media is not supported in your browser
VIEW IN TELEGRAM
🐟Segment Any Motion in Video🐟

πŸ‘‰From CVPR2025 a novel approach for moving object segmentation that combines DINO-based semantic features and SAM2. Code under MIT licenseπŸ’™

πŸ‘‰Review https://t.ly/4aYjJ
πŸ‘‰Paper arxiv.org/pdf/2503.22268
πŸ‘‰Project motion-seg.github.io/
πŸ‘‰Repo github.com/nnanhuang/SegAnyMo
πŸ”₯5❀3πŸ‘3🀯1
This media is not supported in your browser
VIEW IN TELEGRAM
πŸ’ƒ Video Motion Graphs πŸ’ƒ

πŸ‘‰#Adobe unveils a novel system designed to generate realistic human motion videos. Using a reference video and conditional signals such as music or motion tags, the system synthesizes amazing new videos. Code & Models to be releasedπŸ’™

πŸ‘‰Review https://t.ly/r4EGF
πŸ‘‰Paper https://lnkd.in/dK_tHyzh
πŸ‘‰Project https://lnkd.in/dE6c_KYZ
πŸ‘‰Repo TBA
❀15πŸ”₯7πŸ‘2πŸ‘1😍1🀣1
This media is not supported in your browser
VIEW IN TELEGRAM
🌳 Compose Anything is out 🌳

πŸ‘‰Skywork AI unveils SkyReels-A2, a controllable video generation framework capable of assembling arbitrary visual elements (e.g., characters, objects, backgrounds) into synthesized videos based on textual prompts. Code, models, & evaluation benchmark releasedπŸ’™

πŸ‘‰Review https://t.ly/MEjzL
πŸ‘‰Paper https://arxiv.org/pdf/2504.02436
πŸ‘‰Project skyworkai.github.io/skyreels-a2.github.io/
πŸ‘‰Repo github.com/SkyworkAI/SkyReels-A2
πŸ€—Models https://huggingface.co/Skywork/SkyReels-A2
❀9πŸ‘3😍2πŸ”₯1🀩1🀣1
This media is not supported in your browser
VIEW IN TELEGRAM
β›½ VoRA: Vision as LoRA β›½

πŸ‘‰#ByteDance unveils Vision as LoRA (VoRA), a novel paradigm converting LLMs into Multimodal Large Language Models (MLLMs) by integrating vision-specific LoRA layers. All training data, codes, and model weights availableπŸ’™

πŸ‘‰Review https://t.ly/guNVN
πŸ‘‰Paper arxiv.org/pdf/2503.20680
πŸ‘‰Repo github.com/Hon-Wong/VoRA
πŸ‘‰Project georgeluimmortal.github.io/vora-homepage.github.io/
πŸ‘15❀7🀯4πŸ‘1
This media is not supported in your browser
VIEW IN TELEGRAM
🐈 TTT Long Video Generation🐈

πŸ‘‰A novel architecture for video generation adapting the CogVideoX 5B model by incorporating Test-Time Training layers. Adding TTT layers into a pre-trained Transformer -> one-minute clip from text storyboards. Videos, code & annotations releasedπŸ’™

πŸ‘‰Review https://t.ly/mhlTN
πŸ‘‰Paper arxiv.org/pdf/2504.05298
πŸ‘‰Project test-time-training.github.io/video-dit/
πŸ‘‰Repo github.com/test-time-training/ttt-video-dit
❀13πŸ”₯3😍2
This media is not supported in your browser
VIEW IN TELEGRAM
πŸ’› Unified Scalable SVG Generator πŸ’›

πŸ‘‰OmniSVG is the first family of e2e multimodal generators that leverages pre-trained VLMs to create detailed SVGs. Code, models & dataset to be released under MITπŸ’™

πŸ‘‰Review https://t.ly/JcR3I
πŸ‘‰Paper https://arxiv.org/pdf/2504.06263
πŸ‘‰Project https://omnisvg.github.io/
πŸ‘‰Repo github.com/OmniSVG/OmniSVG
πŸ‘‰Dataset https://huggingface.co/OmniSVG
❀15πŸ”₯2πŸ‘1πŸ‘1😍1
This media is not supported in your browser
VIEW IN TELEGRAM
🧊BoxDreamer Object Pose🧊

πŸ‘‰BoxDreamer is a generalizable RGB-based approach for #3D object pose estimation in the wild, specifically designed to address challenges in sparse-view settings. Code coming, demo releasedπŸ’™

πŸ‘‰Review https://t.ly/e-vX9
πŸ‘‰Paper arxiv.org/pdf/2504.07955
πŸ‘‰Project https://lnkd.in/djz8jqn9
πŸ‘‰Repo https://lnkd.in/dfuEawSA
πŸ€—Demo https://lnkd.in/dVYaWGcS
❀3πŸ”₯3πŸ‘2πŸ‘1
This media is not supported in your browser
VIEW IN TELEGRAM
πŸ₯Š Pose in Combat Sports πŸ₯Š

πŸ‘‰The novel SOTA framework for an accurate physics-based #3D human pose estimation in combat sports w/ sparse multi-cameras setup. Dataset to be released soonπŸ’™

πŸ‘‰Review https://t.ly/EfcGL
πŸ‘‰Paper https://lnkd.in/deMMrKcA
πŸ‘‰Project https://lnkd.in/dkMS_UrH
πŸ‘13πŸ”₯4❀3🀯2
This media is not supported in your browser
VIEW IN TELEGRAM
πŸ’₯Geo4D: VideoGen 4D SceneπŸ’₯

πŸ‘‰The Oxford VGG unveils Geo4D: video diffusion for monocular 4D reconstruction. Only synthetic data for training, but strong generalization to real world: point maps, depth & ray maps for the new SOTA in dynamic reconstruction. Code releasedπŸ’™

πŸ‘‰Review https://t.ly/X55Uj
πŸ‘‰Paper arxiv.org/pdf/2504.07961
πŸ‘‰Project geo4d.github.io/
πŸ‘‰Code github.com/jzr99/Geo4D
πŸ”₯12❀2πŸ‘1😍1
This media is not supported in your browser
VIEW IN TELEGRAM
πŸ„ 4D Mocap Human-Object πŸ„

πŸ‘‰#Adobe unveils HUMOTO, HQ dataset of human-object interactions for motion generation, computer vision, and robotics: 700+ sequences (7,875 seconds @ 30FPS), interactions with 63 precisely modeled objects and 72 articulated parts

πŸ‘‰Review https://t.ly/lCof3
πŸ‘‰Paper https://lnkd.in/dVVBDd_c
πŸ‘‰Project https://lnkd.in/dwBcseDf
❀8πŸ‘2πŸ”₯1πŸ‘1
This media is not supported in your browser
VIEW IN TELEGRAM
🍏PartField #3D Part Segmentation🍏

πŸ‘‰#Nvidia unveils PartField, a FFW approach for learning part-based 3D features, which captures the general concept of parts and their hierarchy. Suitable for single-shape decomposition, co-segm., correspondence & more. Code & Models released under Nvidia LicenseπŸ’™

πŸ‘‰Review https://t.ly/fGb2O
πŸ‘‰Paper https://lnkd.in/dGeyKSzG
πŸ‘‰Code https://lnkd.in/dbe57XGH
πŸ‘‰Project https://lnkd.in/dhEgf7X2
❀2πŸ”₯2🀯2
This media is not supported in your browser
VIEW IN TELEGRAM
🐯UniAnimate-DiT: Human Animation🐯

πŸ‘‰UniAnimate-DiT is a novel n' effective framework based on Wan2.1 for consistent human image animation. LoRAs to finetune the model parameters -reducing memory- maintaining the original model’s generative skills. Training and inference code releasedπŸ’™

πŸ‘‰Review https://t.ly/1I50N
πŸ‘‰Paper https://arxiv.org/pdf/2504.11289
πŸ‘‰Repo https://github.com/ali-vilab/UniAnimate-DiT
πŸ”₯9😍4πŸ‘2πŸ‘2
This media is not supported in your browser
VIEW IN TELEGRAM
πŸ”₯General attention-based objectπŸ”₯

πŸ‘‰GATE3D is a novel framework designed specifically for generalized monocular 3D object detection via weak supervision. GATE3D effectively bridges domain gaps by employing consistency losses between 2D and 3D predictions.

πŸ‘‰Review https://t.ly/O7wqH
πŸ‘‰Paper https://lnkd.in/dc5VTUj9
πŸ‘‰Project https://lnkd.in/dzrt-qQV
πŸ”₯8πŸ‘3πŸ‘1😍1
This media is not supported in your browser
VIEW IN TELEGRAM
πŸ”Event Blurry Super-ResolutionπŸ”

πŸ‘‰USTC unveils Ev-DeblurVSR: event signals into BVSR for a novel event-enhanced network. Blurry Video Super-Resolution (BVSR) aiming at generating HR videos from low-resolution and blurry inputs. Pretrained models and test released under ApacheπŸ’™

πŸ‘‰Review https://t.ly/x6hRs
πŸ‘‰Paper https://lnkd.in/dzbkCJMh
πŸ‘‰Repo https://lnkd.in/dmvsc-yS
πŸ”₯20❀8🀯5🀩1😍1
This media is not supported in your browser
VIEW IN TELEGRAM
πŸ”₯ #Apple Co-Motion is out! πŸ”₯

πŸ‘‰Apple unveils a novel approach for detecting & tracking detailed 3D poses of multiple people from single monocular stream. Temporally coherent predictions in crowded scenes with hard poses & occlusions. New SOTA, 10x faster! Code & Models released only for researchπŸ’™

πŸ‘‰Review https://t.ly/-86CO
πŸ‘‰Paper https://lnkd.in/dQsVGY7q
πŸ‘‰Repo https://lnkd.in/dh7j7N89
πŸ‘7🀣6❀5πŸ”₯2😍1
This media is not supported in your browser
VIEW IN TELEGRAM
🧊TAP in Persistent 3D Geometry🧊

πŸ‘‰TAPIP3D is the novel SOTA for long-term 3D point tracking in mono-RGB/RGB-D. Videos as camera-stabilized spatio-temporal feature clouds, leveraging depth & motion to lift 2D video feats into a 3D world space where camera motion is effectively canceled. Code under ApacheπŸ’™

πŸ‘‰Review https://t.ly/oooMy
πŸ‘‰Paper https://lnkd.in/d8uqjdE4
πŸ‘‰Project https://tapip3d.github.io/
πŸ‘‰Repo https://lnkd.in/dsvHP_8u
πŸ”₯7❀2😍2πŸ‘1πŸ‘1
This media is not supported in your browser
VIEW IN TELEGRAM
🦧 #Nvidia Describe Anything 🦧

πŸ‘‰Nvidia unveils Describe Anything Model (DAM) the new SOTA in generating detailed descriptions for user-specified regions in images/videos, marked by points, boxes, scribbles, or masks. Repo under Apache, Dataset available and live demo on πŸ€—

πŸ‘‰Review https://t.ly/la4JD
πŸ‘‰Paper https://lnkd.in/dZh82xtV
πŸ‘‰Project https://lnkd.in/dcv9V2ZF
πŸ‘‰Repo https://lnkd.in/dJB9Ehtb
πŸ€—Demo https://lnkd.in/dXDb2MWU
πŸ”₯10πŸ‘5❀1
This media is not supported in your browser
VIEW IN TELEGRAM
πŸ“Moving Points -> DepthπŸ“

πŸ‘‰KAIST & Adobe propose Seurat, a novel method that infers relative depth by examining the spatial relationships and temporal evolution of a set of tracked 2D trajectories (via off-the-shelf point tracking models). Repo & Demo to be releasedπŸ’™

πŸ‘‰Review https://t.ly/qA2P5
πŸ‘‰Paper https://lnkd.in/dpXDaQtM
πŸ‘‰Project https://lnkd.in/d9qWYsjP
πŸ‘‰Repo https://lnkd.in/dZEMDiJh
❀9πŸ”₯3πŸ‘1πŸ‘1
This media is not supported in your browser
VIEW IN TELEGRAM
🌼SOTA Textured 3D-Guided VTON🌼

πŸ‘‰#ALIBABA unveils 3DV-TON, a novel diffusion model for HQ and temporally consistent video. Generating animatable textured 3D meshes as explicit frame-level guidance, alleviating the issue of models over-focusing on appearance fidelity at the expanse of motion coherence. Code & benchmark to be releasedπŸ’™

πŸ‘‰Review https://t.ly/0tjdC
πŸ‘‰Paper https://lnkd.in/dFseYSXz
πŸ‘‰Project https://lnkd.in/djtqzrzs
πŸ‘‰Repo TBA
🀯9πŸ‘7❀4πŸ”₯2πŸ‘1
This media is not supported in your browser
VIEW IN TELEGRAM
🍏#Nvidia Dynamic Pose 🍏

πŸ‘‰Nvidia unveils DynPose-100K, the largest dataset of dynamic Internet videos annotated with camera poses. Dataset released under Nvidia licenseπŸ’™

πŸ‘‰Review https://t.ly/wrcb0
πŸ‘‰Paper https://lnkd.in/dycGjAyy
πŸ‘‰Project https://lnkd.in/dDZ2Ej_Q
πŸ€—Data https://lnkd.in/d8yUSB7m
πŸ”₯4πŸ‘2❀1🀯1😍1