ML Research Hub
32.8K subscribers
5.58K photos
354 videos
24 files
6.04K links
Advancing research in Machine Learning – practical insights, tools, and techniques for researchers.

Admin: @HusseinSheikho || @Hussein_Sheikho
Download Telegram
VidEoMT: Your ViT is Secretly Also a Video Segmentation Model

📝 Summary:
VidEoMT is a video segmentation model that eliminates complex tracking modules by using a Vision Transformer encoder with query propagation and fusion. This enables efficient temporal modeling, achieving competitive accuracy and 5-10x faster processing speeds.

🔹 Publication Date: Published on Feb 19

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2602.17807
• PDF: https://arxiv.org/pdf/2602.17807
• Project Page: https://www.tue-mps.org/videomt/
• Github: https://github.com/tue-mps/videomt

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#VideoSegmentation #VisionTransformers #ComputerVision #DeepLearning #AIResearch
This media is not supported in your browser
VIEW IN TELEGRAM
PersonaLive! Expressive Portrait Image Animation for Live Streaming

📝 Summary:
PersonaLive enables real-time, expressive portrait animation for live streaming. It uses hybrid implicit signals, appearance distillation, and autoregressive streaming generation to achieve low-latency, stable results with up to 22x speedup.

🔹 Publication Date: Published on Dec 12, 2025

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.11253
• PDF: https://arxiv.org/pdf/2512.11253
• Github: https://github.com/GVCLab/PersonaLive

🔹 Models citing this paper:
https://huggingface.co/huaichang/PersonaLive

Spaces citing this paper:
https://huggingface.co/spaces/seawolf2357/personalive

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#PortraitAnimation #LiveStreaming #RealtimeAI #ComputerVision #GenerativeAI
Media is too big
VIEW IN TELEGRAM
4RC: 4D Reconstruction via Conditional Querying Anytime and Anywhere

📝 Summary:
4RC introduces a unified feed-forward framework for 4D reconstruction from monocular video. It learns holistic scene geometry and motion dynamics using a novel transformer-based 'encode-once, query-anywhere and anytime' approach. This method significantly outperforms prior 4D reconstruction techn...

🔹 Publication Date: Published on Feb 10

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2602.10094
• PDF: https://arxiv.org/pdf/2602.10094
• Project Page: https://yihangluo.com/projects/4RC/

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#4DReconstruction #ComputerVision #DeepLearning #NeuralNetworks #MonocularVideo
1
Spanning the Visual Analogy Space with a Weight Basis of LoRAs

📝 Summary:
LoRWeB improves visual analogy learning by dynamically composing a basis of LoRA modules. It uses an encoder to select and weigh multiple LoRAs at inference time, rather than a single fixed module. This achieves state-of-the-art performance and significantly better generalization for image manipu...

🔹 Publication Date: Published on Feb 17

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2602.15727
• PDF: https://arxiv.org/pdf/2602.15727
• Project Page: https://research.nvidia.com/labs/par/lorweb/
• Github: https://github.com/NVlabs/LoRWeB

Datasets citing this paper:
https://huggingface.co/datasets/hilamanor/LoRWeB_evalset

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#LoRA #VisualAnalogies #DeepLearning #AI #ComputerVision
Learning Cross-View Object Correspondence via Cycle-Consistent Mask Prediction

📝 Summary:
This paper presents a conditional binary segmentation framework for robust cross-view object correspondence. It uses cycle-consistency training to create view-invariant representations without ground-truth annotations. This approach achieves state-of-the-art performance on relevant benchmarks.

🔹 Publication Date: Published on Feb 22

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2602.18996
• PDF: https://arxiv.org/pdf/2602.18996
• Github: https://github.com/shannany0606/CCMP

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#ComputerVision #MachineLearning #ObjectCorrespondence #ImageSegmentation #SelfSupervisedLearning
1
VLANeXt: Recipes for Building Strong VLA Models

📝 Summary:
This paper systematically analyzes Vision-Language-Action VLA models through a unified framework, distilling 12 key design principles. The resulting VLANeXt model achieves superior performance on benchmarks and strong real-world generalization.

🔹 Publication Date: Published on Feb 20

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2602.18532
• PDF: https://arxiv.org/pdf/2602.18532
• Project Page: https://dravenalg.github.io/VLANeXt/
• Github: https://github.com/DravenALG/awesome-vla

🔹 Models citing this paper:
https://huggingface.co/DravenALG/VLANeXt

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#VLANeXt #VLAModels #ComputerVision #Robotics #AIResearch
1
ImplicitRDP: An End-to-End Visual-Force Diffusion Policy with Structural Slow-Fast Learning

📝 Summary:
ImplicitRDP is an end-to-end visual-force diffusion policy that integrates asynchronous vision and force sensing using structural slow-fast learning and virtual-target regularization. It improves reactivity and success in contact-rich manipulation tasks.

🔹 Publication Date: Published on Dec 11, 2025

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.10946
• PDF: https://arxiv.org/pdf/2512.10946
• Project Page: https://implicit-rdp.github.io
• Github: https://github.com/Chen-Wendi/ImplicitRDP

🔹 Models citing this paper:
https://huggingface.co/WendiChen/ImplicitRDP_model

Datasets citing this paper:
https://huggingface.co/datasets/WendiChen/ImplicitRDP_dataset

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#Robotics #DiffusionPolicy #MachineLearning #Manipulation #ComputerVision
LaS-Comp: Zero-shot 3D Completion with Latent-Spatial Consistency

📝 Summary:
LaS-Comp is a zero-shot 3D shape completion method that leverages 3D foundation models. It uses a two-stage approach for faithful reconstruction and seamless boundary refinement. This training-free framework outperforms prior state-of-the-art methods.

🔹 Publication Date: Published on Feb 21

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2602.18735
• PDF: https://arxiv.org/pdf/2602.18735
• Github: https://github.com/DavidYan2001/LaS-Comp

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#3DCompletion #ZeroShotLearning #FoundationModels #ComputerVision #AI
Communication-Inspired Tokenization for Structured Image Representations

📝 Summary:
COMiT introduces a framework for learning structured, object-centric visual tokens through iterative encoding and flow-matching decoding. This single-transformer approach improves compositional generalization and relational reasoning by creating interpretable token structures.

🔹 Publication Date: Published on Feb 24

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2602.20731
• PDF: https://arxiv.org/pdf/2602.20731
• Project Page: https://araachie.github.io/comit/
• Github: https://github.com/araachie/comit

🔹 Models citing this paper:
https://huggingface.co/cvg-unibe/comit-xl
https://huggingface.co/cvg-unibe/comit-l
https://huggingface.co/cvg-unibe/comit-b

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#ComputerVision #Transformers #ImageRecognition #RepresentationLearning #AIResearch
From Statics to Dynamics: Physics-Aware Image Editing with Latent Transition Priors

📝 Summary:
PhysicEdit addresses physically implausible image editing by modeling edits as predictive physical state transitions. It uses a dual-thinking diffusion framework guided by a vision-language model, greatly enhancing physical realism.

🔹 Publication Date: Published on Feb 25

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2602.21778
• PDF: https://arxiv.org/pdf/2602.21778
• Project Page: https://liangbingzhao.github.io/statics2dynamics/
• Github: https://github.com/liangbingzhao/PhysicEdit

Datasets citing this paper:
https://huggingface.co/datasets/metazlb/PhysicTran38K

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#ImageEditing #DiffusionModels #ComputerVision #PhysicsAI #AIResearch
This media is not supported in your browser
VIEW IN TELEGRAM
EmbodMocap: In-the-Wild 4D Human-Scene Reconstruction for Embodied Agents

📝 Summary:
EmbodMocap is a dual-iPhone system for in-the-wild 4D human-scene reconstruction. It unifies human and scene data in a metric world frame, improving accuracy. This supports embodied AI tasks like animation and robot control.

🔹 Publication Date: Published on Feb 26

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2602.23205
• PDF: https://arxiv.org/pdf/2602.23205
• Project Page: https://wenjiawang0312.github.io/projects/embodmocap/
• Github: https://github.com/WenjiaWang0312/EmbodMocap

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#EmbodiedAI #4DReconstruction #ComputerVision #Robotics #Animation
Accelerating Masked Image Generation by Learning Latent Controlled Dynamics

📝 Summary:
MIGM-Shortcut accelerates masked image generation by learning a lightweight model to predict feature evolution velocity from previous features and sampled tokens. This achieves over 4x speedup with maintained quality on state-of-the-art models.

🔹 Publication Date: Published on Feb 27

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2602.23996
• PDF: https://arxiv.org/pdf/2602.23996
• Github: https://github.com/Kaiwen-Zhu/MIGM-Shortcut

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#ImageGeneration #DeepLearning #GenerativeAI #ComputerVision #AI
SenCache: Accelerating Diffusion Model Inference via Sensitivity-Aware Caching

📝 Summary:
SenCache accelerates diffusion model inference by dynamically selecting cache timesteps based on model output sensitivity to input perturbations. This principled framework improves visual quality over existing heuristic methods within similar computational budgets.

🔹 Publication Date: Published on Feb 27

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2602.24208
• PDF: https://arxiv.org/pdf/2602.24208
• Github: https://github.com/vita-epfl/SenCache

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#DiffusionModels #AI #MachineLearning #InferenceAcceleration #ComputerVision
Media is too big
VIEW IN TELEGRAM
Mode Seeking meets Mean Seeking for Fast Long Video Generation

📝 Summary:
This paper introduces a Decoupled Diffusion Transformer combining mode seeking and mean seeking for efficient long video generation. It leverages global flow matching for narrative coherence and local distribution matching against a short-video teacher for realism, effectively bridging the fideli...

🔹 Publication Date: Published on Feb 27

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2602.24289
• PDF: https://arxiv.org/pdf/2602.24289
• Project Page: https://primecai.github.io/mmm/
• Github: https://primecai.github.io/mmm/

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#VideoGeneration #DiffusionModels #AIResearch #MachineLearning #ComputerVision
How to Take a Memorable Picture? Empowering Users with Actionable Feedback

📝 Summary:
This paper introduces Memorability Feedback MemFeed, a new task providing actionable natural language guidance to improve photo memorability. Their method, MemCoach, uses MLLMs and a teacher-student strategy, demonstrating that memorability can be taught and instructed.

🔹 Publication Date: Published on Feb 25

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2602.21877
• PDF: https://arxiv.org/pdf/2602.21877
• Project Page: https://laitifranz.github.io/MemCoach/
• Github: https://laitifranz.github.io/MemCoach/

Datasets citing this paper:
https://huggingface.co/datasets/laitifranz/MemBench-InternVL3.5-Eval
https://huggingface.co/datasets/laitifranz/MemBench

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#PhotoMemorability #MLLMs #ComputerVision #AIResearch #HumanComputerInteraction
WorldStereo: Bridging Camera-Guided Video Generation and Scene Reconstruction via 3D Geometric Memories

📝 Summary:
WorldStereo integrates camera-guided video generation and 3D reconstruction using geometric memory modules. These provide camera control and structural priors for multi-view consistent videos, enabling high-quality 3D scene reconstruction.

🔹 Publication Date: Published on Mar 2

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.02049
• PDF: https://arxiv.org/pdf/2603.02049
• Project Page: https://3d.hunyuan.tencent.com/sceneTo3D
• Github: https://github.com/FuchengSu/WorldStereo

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#VideoGeneration #3DReconstruction #ComputerVision #DeepLearning #NeuralRendering
Monocular Mesh Recovery and Body Measurement of Female Saanen Goats

📝 Summary:
This paper introduces a novel 3D body measurement system for Saanen goats. It uses a new parametric shape model and a multi-view RGBD dataset to enable accurate single-view 3D reconstruction and automated measurement of key body dimensions, improving precision livestock farming.

🔹 Publication Date: Published on Feb 23

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2602.19896
• PDF: https://arxiv.org/pdf/2602.19896
• Github: https://github.com/bojin-nwafu/Female-Saanen-Goats

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#3DReconstruction #ComputerVision #PrecisionLivestock #AnimalScience #AgriTech
Synthetic Visual Genome 2: Extracting Large-scale Spatio-Temporal Scene Graphs from Videos

📝 Summary:
A large video scene graph dataset, SVG2, and a new model, TRaSER, are introduced. TRaSER generates spatio-temporal scene graphs, significantly improving relation, object, and attribute prediction, and boosting video question answering accuracy.

🔹 Publication Date: Published on Feb 26

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2602.23543
• PDF: https://arxiv.org/pdf/2602.23543
• Project Page: https://uwgzq.github.io/papers/SVG2/

🔹 Models citing this paper:
https://huggingface.co/UWGZQ/TRASER

Datasets citing this paper:
https://huggingface.co/datasets/UWGZQ/Synthetic_Visual_Genome2

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#VideoSceneGraphs #SpatioTemporal #ComputerVision #VideoQA #DeepLearning
2