ML Research Hub – Telegram

ML Research Hub

32.8K subscribers

5.58K photos

354 videos

24 files

6.04K links

Advancing research in Machine Learning – practical insights, tools, and techniques for researchers.

Admin: @HusseinSheikho || @Hussein_Sheikho

Download Telegram

About

Blog

Apps

Platform

ML Research Hub

32.8K subscribers

ML Research Hub

✨VidEoMT: Your ViT is Secretly Also a Video Segmentation Model

📝 Summary:
VidEoMT is a video segmentation model that eliminates complex tracking modules by using a Vision Transformer encoder with query propagation and fusion. This enables efficient temporal modeling, achieving competitive accuracy and 5-10x faster processing speeds.

🔹 Publication Date: Published on Feb 19

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2602.17807
• PDF: https://arxiv.org/pdf/2602.17807
• Project Page: https://www.tue-mps.org/videomt/
• Github: https://github.com/tue-mps/videomt

==================================

For more data science resources:
✓ https://t.iss.one/DataScienceT

#VideoSegmentation #VisionTransformers #ComputerVision #DeepLearning #AIResearch

254 views14:03

✨ Explore Data Science 📝 Write your paper

ML Research Hub

This media is not supported in your browser

VIEW IN TELEGRAM

✨PersonaLive! Expressive Portrait Image Animation for Live Streaming

📝 Summary:
PersonaLive enables real-time, expressive portrait animation for live streaming. It uses hybrid implicit signals, appearance distillation, and autoregressive streaming generation to achieve low-latency, stable results with up to 22x speedup.

🔹 Publication Date: Published on Dec 12, 2025

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.11253
• PDF: https://arxiv.org/pdf/2512.11253
• Github: https://github.com/GVCLab/PersonaLive

🔹 Models citing this paper:
• https://huggingface.co/huaichang/PersonaLive

✨ Spaces citing this paper:
• https://huggingface.co/spaces/seawolf2357/personalive

==================================

For more data science resources:
✓ https://t.iss.one/DataScienceT

#PortraitAnimation #LiveStreaming #RealtimeAI #ComputerVision #GenerativeAI

288 views14:03

✨ Explore Data Science 📝 Write your paper

ML Research Hub

Media is too big

VIEW IN TELEGRAM

✨4RC: 4D Reconstruction via Conditional Querying Anytime and Anywhere

📝 Summary:
4RC introduces a unified feed-forward framework for 4D reconstruction from monocular video. It learns holistic scene geometry and motion dynamics using a novel transformer-based 'encode-once, query-anywhere and anytime' approach. This method significantly outperforms prior 4D reconstruction techn...

🔹 Publication Date: Published on Feb 10

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2602.10094
• PDF: https://arxiv.org/pdf/2602.10094
• Project Page: https://yihangluo.com/projects/4RC/

==================================

For more data science resources:
✓ https://t.iss.one/DataScienceT

#4DReconstruction #ComputerVision #DeepLearning #NeuralNetworks #MonocularVideo

❤1

286 views16:04

✨ Explore Data Science 📝 Write your paper

ML Research Hub

✨Spanning the Visual Analogy Space with a Weight Basis of LoRAs

📝 Summary:
LoRWeB improves visual analogy learning by dynamically composing a basis of LoRA modules. It uses an encoder to select and weigh multiple LoRAs at inference time, rather than a single fixed module. This achieves state-of-the-art performance and significantly better generalization for image manipu...

🔹 Publication Date: Published on Feb 17

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2602.15727
• PDF: https://arxiv.org/pdf/2602.15727
• Project Page: https://research.nvidia.com/labs/par/lorweb/
• Github: https://github.com/NVlabs/LoRWeB

✨ Datasets citing this paper:
• https://huggingface.co/datasets/hilamanor/LoRWeB_evalset

==================================

For more data science resources:
✓ https://t.iss.one/DataScienceT

#LoRA #VisualAnalogies #DeepLearning #AI #ComputerVision

308 views17:04

✨ Explore Data Science 📝 Write your paper

ML Research Hub

✨Learning Cross-View Object Correspondence via Cycle-Consistent Mask Prediction

📝 Summary:
This paper presents a conditional binary segmentation framework for robust cross-view object correspondence. It uses cycle-consistency training to create view-invariant representations without ground-truth annotations. This approach achieves state-of-the-art performance on relevant benchmarks.

🔹 Publication Date: Published on Feb 22

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2602.18996
• PDF: https://arxiv.org/pdf/2602.18996
• Github: https://github.com/shannany0606/CCMP

==================================

For more data science resources:
✓ https://t.iss.one/DataScienceT

#ComputerVision #MachineLearning #ObjectCorrespondence #ImageSegmentation #SelfSupervisedLearning

❤1

202 views10:02

✨ Explore Data Science 📝 Write your paper

ML Research Hub

✨VLANeXt: Recipes for Building Strong VLA Models

📝 Summary:
This paper systematically analyzes Vision-Language-Action VLA models through a unified framework, distilling 12 key design principles. The resulting VLANeXt model achieves superior performance on benchmarks and strong real-world generalization.

🔹 Publication Date: Published on Feb 20

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2602.18532
• PDF: https://arxiv.org/pdf/2602.18532
• Project Page: https://dravenalg.github.io/VLANeXt/
• Github: https://github.com/DravenALG/awesome-vla

🔹 Models citing this paper:
• https://huggingface.co/DravenALG/VLANeXt

==================================

For more data science resources:
✓ https://t.iss.one/DataScienceT

#VLANeXt #VLAModels #ComputerVision #Robotics #AIResearch

❤1

247 views11:03

✨ Explore Data Science 📝 Write your paper

ML Research Hub

✨ImplicitRDP: An End-to-End Visual-Force Diffusion Policy with Structural Slow-Fast Learning

📝 Summary:
ImplicitRDP is an end-to-end visual-force diffusion policy that integrates asynchronous vision and force sensing using structural slow-fast learning and virtual-target regularization. It improves reactivity and success in contact-rich manipulation tasks.

🔹 Publication Date: Published on Dec 11, 2025

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.10946
• PDF: https://arxiv.org/pdf/2512.10946
• Project Page: https://implicit-rdp.github.io
• Github: https://github.com/Chen-Wendi/ImplicitRDP

🔹 Models citing this paper:
• https://huggingface.co/WendiChen/ImplicitRDP_model

✨ Datasets citing this paper:
• https://huggingface.co/datasets/WendiChen/ImplicitRDP_dataset

==================================

For more data science resources:
✓ https://t.iss.one/DataScienceT

#Robotics #DiffusionPolicy #MachineLearning #Manipulation #ComputerVision

281 views13:04

✨ Explore Data Science 📝 Write your paper

ML Research Hub

✨LaS-Comp: Zero-shot 3D Completion with Latent-Spatial Consistency

📝 Summary:
LaS-Comp is a zero-shot 3D shape completion method that leverages 3D foundation models. It uses a two-stage approach for faithful reconstruction and seamless boundary refinement. This training-free framework outperforms prior state-of-the-art methods.

🔹 Publication Date: Published on Feb 21

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2602.18735
• PDF: https://arxiv.org/pdf/2602.18735
• Github: https://github.com/DavidYan2001/LaS-Comp

==================================

For more data science resources:
✓ https://t.iss.one/DataScienceT

#3DCompletion #ZeroShotLearning #FoundationModels #ComputerVision #AI

142 views08:22

✨ Explore Data Science 📝 Write your paper

ML Research Hub

✨Communication-Inspired Tokenization for Structured Image Representations

📝 Summary:
COMiT introduces a framework for learning structured, object-centric visual tokens through iterative encoding and flow-matching decoding. This single-transformer approach improves compositional generalization and relational reasoning by creating interpretable token structures.

🔹 Publication Date: Published on Feb 24

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2602.20731
• PDF: https://arxiv.org/pdf/2602.20731
• Project Page: https://araachie.github.io/comit/
• Github: https://github.com/araachie/comit

🔹 Models citing this paper:
• https://huggingface.co/cvg-unibe/comit-xl
• https://huggingface.co/cvg-unibe/comit-l
• https://huggingface.co/cvg-unibe/comit-b

==================================

For more data science resources:
✓ https://t.iss.one/DataScienceT

#ComputerVision #Transformers #ImageRecognition #RepresentationLearning #AIResearch

209 views09:23

✨ Explore Data Science 📝 Write your paper

ML Research Hub

✨From Statics to Dynamics: Physics-Aware Image Editing with Latent Transition Priors

📝 Summary:
PhysicEdit addresses physically implausible image editing by modeling edits as predictive physical state transitions. It uses a dual-thinking diffusion framework guided by a vision-language model, greatly enhancing physical realism.

🔹 Publication Date: Published on Feb 25

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2602.21778
• PDF: https://arxiv.org/pdf/2602.21778
• Project Page: https://liangbingzhao.github.io/statics2dynamics/
• Github: https://github.com/liangbingzhao/PhysicEdit

✨ Datasets citing this paper:
• https://huggingface.co/datasets/metazlb/PhysicTran38K

==================================

For more data science resources:
✓ https://t.iss.one/DataScienceT

#ImageEditing #DiffusionModels #ComputerVision #PhysicsAI #AIResearch

216 views12:40

✨ Explore Data Science 📝 Write your paper

ML Research Hub

This media is not supported in your browser

VIEW IN TELEGRAM

✨EmbodMocap: In-the-Wild 4D Human-Scene Reconstruction for Embodied Agents

📝 Summary:
EmbodMocap is a dual-iPhone system for in-the-wild 4D human-scene reconstruction. It unifies human and scene data in a metric world frame, improving accuracy. This supports embodied AI tasks like animation and robot control.

🔹 Publication Date: Published on Feb 26

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2602.23205
• PDF: https://arxiv.org/pdf/2602.23205
• Project Page: https://wenjiawang0312.github.io/projects/embodmocap/
• Github: https://github.com/WenjiaWang0312/EmbodMocap

==================================

For more data science resources:
✓ https://t.iss.one/DataScienceT

#EmbodiedAI #4DReconstruction #ComputerVision #Robotics #Animation

218 views11:11

✨ Explore Data Science 📝 Write your paper

ML Research Hub

✨Accelerating Masked Image Generation by Learning Latent Controlled Dynamics

📝 Summary:
MIGM-Shortcut accelerates masked image generation by learning a lightweight model to predict feature evolution velocity from previous features and sampled tokens. This achieves over 4x speedup with maintained quality on state-of-the-art models.

🔹 Publication Date: Published on Feb 27

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2602.23996
• PDF: https://arxiv.org/pdf/2602.23996
• Github: https://github.com/Kaiwen-Zhu/MIGM-Shortcut

==================================

For more data science resources:
✓ https://t.iss.one/DataScienceT

#ImageGeneration #DeepLearning #GenerativeAI #ComputerVision #AI

251 views03:00

✨ Explore Data Science 📝 Write your paper

ML Research Hub

✨SenCache: Accelerating Diffusion Model Inference via Sensitivity-Aware Caching

📝 Summary:
SenCache accelerates diffusion model inference by dynamically selecting cache timesteps based on model output sensitivity to input perturbations. This principled framework improves visual quality over existing heuristic methods within similar computational budgets.

🔹 Publication Date: Published on Feb 27

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2602.24208
• PDF: https://arxiv.org/pdf/2602.24208
• Github: https://github.com/vita-epfl/SenCache

==================================

For more data science resources:
✓ https://t.iss.one/DataScienceT

#DiffusionModels #AI #MachineLearning #InferenceAcceleration #ComputerVision

179 views03:00

✨ Explore Data Science 📝 Write your paper

ML Research Hub

Media is too big

VIEW IN TELEGRAM

✨Mode Seeking meets Mean Seeking for Fast Long Video Generation

📝 Summary:
This paper introduces a Decoupled Diffusion Transformer combining mode seeking and mean seeking for efficient long video generation. It leverages global flow matching for narrative coherence and local distribution matching against a short-video teacher for realism, effectively bridging the fideli...

🔹 Publication Date: Published on Feb 27

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2602.24289
• PDF: https://arxiv.org/pdf/2602.24289
• Project Page: https://primecai.github.io/mmm/
• Github: https://primecai.github.io/mmm/

==================================

For more data science resources:
✓ https://t.iss.one/DataScienceT

#VideoGeneration #DiffusionModels #AIResearch #MachineLearning #ComputerVision

180 views05:03

✨ Explore Data Science 📝 Write your paper

ML Research Hub

✨How to Take a Memorable Picture? Empowering Users with Actionable Feedback

📝 Summary:
This paper introduces Memorability Feedback MemFeed, a new task providing actionable natural language guidance to improve photo memorability. Their method, MemCoach, uses MLLMs and a teacher-student strategy, demonstrating that memorability can be taught and instructed.

🔹 Publication Date: Published on Feb 25

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2602.21877
• PDF: https://arxiv.org/pdf/2602.21877
• Project Page: https://laitifranz.github.io/MemCoach/
• Github: https://laitifranz.github.io/MemCoach/

✨ Datasets citing this paper:
• https://huggingface.co/datasets/laitifranz/MemBench-InternVL3.5-Eval
• https://huggingface.co/datasets/laitifranz/MemBench

==================================

For more data science resources:
✓ https://t.iss.one/DataScienceT

#PhotoMemorability #MLLMs #ComputerVision #AIResearch #HumanComputerInteraction

216 views15:05

✨ Explore Data Science 📝 Write your paper

ML Research Hub

✨WorldStereo: Bridging Camera-Guided Video Generation and Scene Reconstruction via 3D Geometric Memories

📝 Summary:
WorldStereo integrates camera-guided video generation and 3D reconstruction using geometric memory modules. These provide camera control and structural priors for multi-view consistent videos, enabling high-quality 3D scene reconstruction.

🔹 Publication Date: Published on Mar 2

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.02049
• PDF: https://arxiv.org/pdf/2603.02049
• Project Page: https://3d.hunyuan.tencent.com/sceneTo3D
• Github: https://github.com/FuchengSu/WorldStereo

==================================

For more data science resources:
✓ https://t.iss.one/DataScienceT

#VideoGeneration #3DReconstruction #ComputerVision #DeepLearning #NeuralRendering

112 views09:04

✨ Explore Data Science 📝 Write your paper

ML Research Hub

✨Monocular Mesh Recovery and Body Measurement of Female Saanen Goats

📝 Summary:
This paper introduces a novel 3D body measurement system for Saanen goats. It uses a new parametric shape model and a multi-view RGBD dataset to enable accurate single-view 3D reconstruction and automated measurement of key body dimensions, improving precision livestock farming.

🔹 Publication Date: Published on Feb 23

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2602.19896
• PDF: https://arxiv.org/pdf/2602.19896
• Github: https://github.com/bojin-nwafu/Female-Saanen-Goats

==================================

For more data science resources:
✓ https://t.iss.one/DataScienceT

#3DReconstruction #ComputerVision #PrecisionLivestock #AnimalScience #AgriTech

153 views10:06

✨ Explore Data Science 📝 Write your paper

ML Research Hub

✨Synthetic Visual Genome 2: Extracting Large-scale Spatio-Temporal Scene Graphs from Videos

📝 Summary:
A large video scene graph dataset, SVG2, and a new model, TRaSER, are introduced. TRaSER generates spatio-temporal scene graphs, significantly improving relation, object, and attribute prediction, and boosting video question answering accuracy.

🔹 Publication Date: Published on Feb 26

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2602.23543
• PDF: https://arxiv.org/pdf/2602.23543
• Project Page: https://uwgzq.github.io/papers/SVG2/

🔹 Models citing this paper:
• https://huggingface.co/UWGZQ/TRASER

✨ Datasets citing this paper:
• https://huggingface.co/datasets/UWGZQ/Synthetic_Visual_Genome2

==================================

For more data science resources:
✓ https://t.iss.one/DataScienceT

#VideoSceneGraphs #SpatioTemporal #ComputerVision #VideoQA #DeepLearning

❤2

204 views12:07

✨ Explore Data Science 📝 Write your paper