ML Research Hub
32.8K subscribers
4.39K photos
270 videos
23 files
4.75K links
Advancing research in Machine Learning – practical insights, tools, and techniques for researchers.

Admin: @HusseinSheikho || @Hussein_Sheikho
Download Telegram
SpaceTools: Tool-Augmented Spatial Reasoning via Double Interactive RL

📝 Summary:
SpaceTools introduces Double Interactive Reinforcement Learning DIRL. This two-phase RL framework enables Vision Language Models to coordinate multiple tools for precise spatial reasoning, achieving state-of-the-art performance on benchmarks and real-world robot tasks.

🔹 Publication Date: Published on Dec 3

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.04069
• PDF: https://arxiv.org/pdf/2512.04069
• Project Page: https://spacetools.github.io/
• Github: https://spacetools.github.io/

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#ReinforcementLearning #VisionLanguageModels #Robotics #SpatialReasoning #AI
MIND-V: Hierarchical Video Generation for Long-Horizon Robotic Manipulation with RL-based Physical Alignment

📝 Summary:
MIND-V generates long-horizon, physically plausible robotic manipulation videos. This hierarchical framework uses semantic reasoning and an RL-based physical alignment strategy to synthesize robust, coherent actions, addressing data scarcity.

🔹 Publication Date: Published on Dec 7

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.06628
• PDF: https://arxiv.org/pdf/2512.06628
• Project Page: https://github.com/Richard-Zhang-AI/MIND-V
• Github: https://github.com/Richard-Zhang-AI/MIND-V

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#Robotics #VideoGeneration #ReinforcementLearning #AI #MachineLearning
This media is not supported in your browser
VIEW IN TELEGRAM
X-Humanoid: Robotize Human Videos to Generate Humanoid Videos at Scale

📝 Summary:
X-Humanoid generates large-scale humanoid video datasets from human videos to boost embodied AI. It uses generative video editing, finetuned on synthetic data, to translate human actions into full-body humanoid motions, generating over 3.6M robotized frames. This method outperforms existing solut...

🔹 Publication Date: Published on Dec 4

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.04537
• PDF: https://arxiv.org/pdf/2512.04537

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#XHumanoid #EmbodiedAI #Robotics #GenerativeAI #ComputerVision
2
Media is too big
VIEW IN TELEGRAM
LEO-RobotAgent: A General-purpose Robotic Agent for Language-driven Embodied Operator

📝 Summary:
LEO-RobotAgent is a general-purpose language-driven framework that uses large language models to enable various robot types to complete complex tasks. It enhances human-robot interaction and task planning, demonstrating strong generalization, robustness, and efficiency across different scenarios.

🔹 Publication Date: Published on Dec 11

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.10605
• PDF: https://arxiv.org/pdf/2512.10605
• Github: https://github.com/LegendLeoChen/LEO-RobotAgent

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#Robotics #LLM #HumanRobotInteraction #EmbodiedAI #AI
1
Vision-Language-Action Models for Autonomous Driving: Past, Present, and Future

📝 Summary:
Vision-Language-Action VLA models integrate visual, linguistic, and action capabilities for autonomous driving. They aim for interpretable and human-aligned policies, addressing prior system limitations. This paper characterizes VLA paradigms, datasets, and future challenges.

🔹 Publication Date: Published on Dec 18

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.16760
• PDF: https://arxiv.org/pdf/2512.16760
• Project Page: https://worldbench.github.io/vla4ad
• Github: https://github.com/worldbench/awesome-vla-for-ad

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#VLAModels #AutonomousDriving #AI #DeepLearning #Robotics
2
MomaGraph: State-Aware Unified Scene Graphs with Vision-Language Model for Embodied Task Planning

📝 Summary:
MomaGraph-R1, a vision-language model trained with reinforcement learning, achieves state-of-the-art performance in predicting task-oriented scene graphs and zero-shot task planning in household envir...

🔹 Publication Date: Published on Dec 18

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.16909
• PDF: https://arxiv.org/pdf/2512.16909
• Github: https://hybridrobotics.github.io/MomaGraph/

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#VisionLanguageModel #EmbodiedAI #ReinforcementLearning #SceneGraphs #Robotics
2
An Anatomy of Vision-Language-Action Models: From Modules to Milestones and Challenges

📝 Summary:
This survey offers a structured guide to Vision-Language-Action VLA models in robotics. It breaks down five key challenges: representation, execution, generalization, safety, and datasets, serving as a roadmap for researchers.

🔹 Publication Date: Published on Dec 12

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.11362
• PDF: https://arxiv.org/pdf/2512.11362
• Project Page: https://suyuz1.github.io/Survery/
• Github: https://suyuz1.github.io/VLA-Survey-Anatomy/

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#VLAModels #Robotics #ArtificialIntelligence #VisionLanguage #AIResearch
1
Dream-VL & Dream-VLA: Open Vision-Language and Vision-Language-Action Models with Diffusion Language Model Backbone

📝 Summary:
Dream-VL and Dream-VLA are diffusion-based vision-language and vision-language-action models. They achieve state-of-the-art performance in visual planning and robotic control, surpassing autoregressive baselines via their diffusion backbone's superior action generation.

🔹 Publication Date: Published on Dec 27

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.22615
• PDF: https://arxiv.org/pdf/2512.22615
• Project Page: https://hkunlp.github.io/blog/2025/dream-vlx/
• Github: https://github.com/DreamLM/Dream-VLX

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#VisionLanguageModels #DiffusionModels #Robotics #AI #ComputerVision
Act2Goal: From World Model To General Goal-conditioned Policy

📝 Summary:
Act2Goal is a new policy for robust long-horizon robotic manipulation. It uses a goal-conditioned visual world model with multi-scale temporal control to plan intermediate states and execute precisely. This allows strong generalization and rapid online adaptation, significantly boosting real-robo...

🔹 Publication Date: Published on Dec 29

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.23541
• PDF: https://arxiv.org/pdf/2512.23541
• Project Page: https://act2goal.github.io/
• Github: https://act2goal.github.io/

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#Robotics #AI #MachineLearning #WorldModels #ReinforcementLearning
Media is too big
VIEW IN TELEGRAM
GR-Dexter Technical Report

📝 Summary:
GR-Dexter introduces a hardware-model-data framework for bimanual dexterous-hand robot manipulation using VLA models. It combines a new 21-DoF hand, teleoperation for data, and diverse datasets. This framework achieves strong performance and robust generalization in real-world manipulation tasks.

🔹 Publication Date: Published on Dec 30, 2025

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.24210
• PDF: https://arxiv.org/pdf/2512.24210
• Project Page: https://byte-dexter.github.io/gr-dexter/

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#Robotics #DexterousManipulation #VLA #RobotHardware #MachineLearning
Dream2Flow: Bridging Video Generation and Open-World Manipulation with 3D Object Flow

📝 Summary:
Dream2Flow bridges video generation and robotic control using 3D object flow. It reconstructs 3D object motions from generated videos, enabling zero-shot manipulation of diverse objects through trajectory tracking without task-specific demonstrations.

🔹 Publication Date: Published on Dec 31, 2025

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.24766
• PDF: https://arxiv.org/pdf/2512.24766

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#VideoGeneration #Robotics #3DVision #AI #ZeroShotLearning
RGS-SLAM: Robust Gaussian Splatting SLAM with One-Shot Dense Initialization

📝 Summary:
RGS-SLAM is a robust Gaussian-splatting SLAM framework that uses a one-shot, correspondence-to-Gaussian initialization with DINOv3 descriptors. This method improves stability, accelerates convergence, and yields higher rendering fidelity and accuracy compared to existing systems.

🔹 Publication Date: Published on Dec 28, 2025

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2601.00705
• PDF: https://arxiv.org/pdf/2601.00705

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#SLAM #GaussianSplatting #ComputerVision #Robotics #DeepLearning
👍1
RoboVIP: Multi-View Video Generation with Visual Identity Prompting Augments Robot Manipulation

📝 Summary:
Collecting diverse robot manipulation data is challenging. This paper introduces visual identity prompting, using exemplar images to guide diffusion models for generating multi-view, temporally coherent data. This augmented data improves robot policy performance in both simulation and real-world ...

🔹 Publication Date: Published on Jan 8

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2601.05241
• PDF: https://arxiv.org/pdf/2601.05241
• Project Page: https://robovip.github.io/RoboVIP/
• Github: https://robovip.github.io/RoboVIP/

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#Robotics #AI #GenerativeAI #ComputerVision #MachineLearning