✨SpaceTools: Tool-Augmented Spatial Reasoning via Double Interactive RL
📝 Summary:
SpaceTools introduces Double Interactive Reinforcement Learning DIRL. This two-phase RL framework enables Vision Language Models to coordinate multiple tools for precise spatial reasoning, achieving state-of-the-art performance on benchmarks and real-world robot tasks.
🔹 Publication Date: Published on Dec 3
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.04069
• PDF: https://arxiv.org/pdf/2512.04069
• Project Page: https://spacetools.github.io/
• Github: https://spacetools.github.io/
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#ReinforcementLearning #VisionLanguageModels #Robotics #SpatialReasoning #AI
📝 Summary:
SpaceTools introduces Double Interactive Reinforcement Learning DIRL. This two-phase RL framework enables Vision Language Models to coordinate multiple tools for precise spatial reasoning, achieving state-of-the-art performance on benchmarks and real-world robot tasks.
🔹 Publication Date: Published on Dec 3
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.04069
• PDF: https://arxiv.org/pdf/2512.04069
• Project Page: https://spacetools.github.io/
• Github: https://spacetools.github.io/
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#ReinforcementLearning #VisionLanguageModels #Robotics #SpatialReasoning #AI
✨MIND-V: Hierarchical Video Generation for Long-Horizon Robotic Manipulation with RL-based Physical Alignment
📝 Summary:
MIND-V generates long-horizon, physically plausible robotic manipulation videos. This hierarchical framework uses semantic reasoning and an RL-based physical alignment strategy to synthesize robust, coherent actions, addressing data scarcity.
🔹 Publication Date: Published on Dec 7
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.06628
• PDF: https://arxiv.org/pdf/2512.06628
• Project Page: https://github.com/Richard-Zhang-AI/MIND-V
• Github: https://github.com/Richard-Zhang-AI/MIND-V
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#Robotics #VideoGeneration #ReinforcementLearning #AI #MachineLearning
📝 Summary:
MIND-V generates long-horizon, physically plausible robotic manipulation videos. This hierarchical framework uses semantic reasoning and an RL-based physical alignment strategy to synthesize robust, coherent actions, addressing data scarcity.
🔹 Publication Date: Published on Dec 7
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.06628
• PDF: https://arxiv.org/pdf/2512.06628
• Project Page: https://github.com/Richard-Zhang-AI/MIND-V
• Github: https://github.com/Richard-Zhang-AI/MIND-V
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#Robotics #VideoGeneration #ReinforcementLearning #AI #MachineLearning
This media is not supported in your browser
VIEW IN TELEGRAM
✨X-Humanoid: Robotize Human Videos to Generate Humanoid Videos at Scale
📝 Summary:
X-Humanoid generates large-scale humanoid video datasets from human videos to boost embodied AI. It uses generative video editing, finetuned on synthetic data, to translate human actions into full-body humanoid motions, generating over 3.6M robotized frames. This method outperforms existing solut...
🔹 Publication Date: Published on Dec 4
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.04537
• PDF: https://arxiv.org/pdf/2512.04537
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#XHumanoid #EmbodiedAI #Robotics #GenerativeAI #ComputerVision
📝 Summary:
X-Humanoid generates large-scale humanoid video datasets from human videos to boost embodied AI. It uses generative video editing, finetuned on synthetic data, to translate human actions into full-body humanoid motions, generating over 3.6M robotized frames. This method outperforms existing solut...
🔹 Publication Date: Published on Dec 4
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.04537
• PDF: https://arxiv.org/pdf/2512.04537
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#XHumanoid #EmbodiedAI #Robotics #GenerativeAI #ComputerVision
❤2
Media is too big
VIEW IN TELEGRAM
✨LEO-RobotAgent: A General-purpose Robotic Agent for Language-driven Embodied Operator
📝 Summary:
LEO-RobotAgent is a general-purpose language-driven framework that uses large language models to enable various robot types to complete complex tasks. It enhances human-robot interaction and task planning, demonstrating strong generalization, robustness, and efficiency across different scenarios.
🔹 Publication Date: Published on Dec 11
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.10605
• PDF: https://arxiv.org/pdf/2512.10605
• Github: https://github.com/LegendLeoChen/LEO-RobotAgent
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#Robotics #LLM #HumanRobotInteraction #EmbodiedAI #AI
📝 Summary:
LEO-RobotAgent is a general-purpose language-driven framework that uses large language models to enable various robot types to complete complex tasks. It enhances human-robot interaction and task planning, demonstrating strong generalization, robustness, and efficiency across different scenarios.
🔹 Publication Date: Published on Dec 11
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.10605
• PDF: https://arxiv.org/pdf/2512.10605
• Github: https://github.com/LegendLeoChen/LEO-RobotAgent
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#Robotics #LLM #HumanRobotInteraction #EmbodiedAI #AI
❤1
✨Vision-Language-Action Models for Autonomous Driving: Past, Present, and Future
📝 Summary:
Vision-Language-Action VLA models integrate visual, linguistic, and action capabilities for autonomous driving. They aim for interpretable and human-aligned policies, addressing prior system limitations. This paper characterizes VLA paradigms, datasets, and future challenges.
🔹 Publication Date: Published on Dec 18
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.16760
• PDF: https://arxiv.org/pdf/2512.16760
• Project Page: https://worldbench.github.io/vla4ad
• Github: https://github.com/worldbench/awesome-vla-for-ad
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#VLAModels #AutonomousDriving #AI #DeepLearning #Robotics
📝 Summary:
Vision-Language-Action VLA models integrate visual, linguistic, and action capabilities for autonomous driving. They aim for interpretable and human-aligned policies, addressing prior system limitations. This paper characterizes VLA paradigms, datasets, and future challenges.
🔹 Publication Date: Published on Dec 18
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.16760
• PDF: https://arxiv.org/pdf/2512.16760
• Project Page: https://worldbench.github.io/vla4ad
• Github: https://github.com/worldbench/awesome-vla-for-ad
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#VLAModels #AutonomousDriving #AI #DeepLearning #Robotics
❤2
✨MomaGraph: State-Aware Unified Scene Graphs with Vision-Language Model for Embodied Task Planning
📝 Summary:
MomaGraph-R1, a vision-language model trained with reinforcement learning, achieves state-of-the-art performance in predicting task-oriented scene graphs and zero-shot task planning in household envir...
🔹 Publication Date: Published on Dec 18
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.16909
• PDF: https://arxiv.org/pdf/2512.16909
• Github: https://hybridrobotics.github.io/MomaGraph/
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#VisionLanguageModel #EmbodiedAI #ReinforcementLearning #SceneGraphs #Robotics
📝 Summary:
MomaGraph-R1, a vision-language model trained with reinforcement learning, achieves state-of-the-art performance in predicting task-oriented scene graphs and zero-shot task planning in household envir...
🔹 Publication Date: Published on Dec 18
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.16909
• PDF: https://arxiv.org/pdf/2512.16909
• Github: https://hybridrobotics.github.io/MomaGraph/
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#VisionLanguageModel #EmbodiedAI #ReinforcementLearning #SceneGraphs #Robotics
❤2
✨An Anatomy of Vision-Language-Action Models: From Modules to Milestones and Challenges
📝 Summary:
This survey offers a structured guide to Vision-Language-Action VLA models in robotics. It breaks down five key challenges: representation, execution, generalization, safety, and datasets, serving as a roadmap for researchers.
🔹 Publication Date: Published on Dec 12
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.11362
• PDF: https://arxiv.org/pdf/2512.11362
• Project Page: https://suyuz1.github.io/Survery/
• Github: https://suyuz1.github.io/VLA-Survey-Anatomy/
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#VLAModels #Robotics #ArtificialIntelligence #VisionLanguage #AIResearch
📝 Summary:
This survey offers a structured guide to Vision-Language-Action VLA models in robotics. It breaks down five key challenges: representation, execution, generalization, safety, and datasets, serving as a roadmap for researchers.
🔹 Publication Date: Published on Dec 12
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.11362
• PDF: https://arxiv.org/pdf/2512.11362
• Project Page: https://suyuz1.github.io/Survery/
• Github: https://suyuz1.github.io/VLA-Survey-Anatomy/
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#VLAModels #Robotics #ArtificialIntelligence #VisionLanguage #AIResearch
❤1
✨Dream-VL & Dream-VLA: Open Vision-Language and Vision-Language-Action Models with Diffusion Language Model Backbone
📝 Summary:
Dream-VL and Dream-VLA are diffusion-based vision-language and vision-language-action models. They achieve state-of-the-art performance in visual planning and robotic control, surpassing autoregressive baselines via their diffusion backbone's superior action generation.
🔹 Publication Date: Published on Dec 27
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.22615
• PDF: https://arxiv.org/pdf/2512.22615
• Project Page: https://hkunlp.github.io/blog/2025/dream-vlx/
• Github: https://github.com/DreamLM/Dream-VLX
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#VisionLanguageModels #DiffusionModels #Robotics #AI #ComputerVision
📝 Summary:
Dream-VL and Dream-VLA are diffusion-based vision-language and vision-language-action models. They achieve state-of-the-art performance in visual planning and robotic control, surpassing autoregressive baselines via their diffusion backbone's superior action generation.
🔹 Publication Date: Published on Dec 27
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.22615
• PDF: https://arxiv.org/pdf/2512.22615
• Project Page: https://hkunlp.github.io/blog/2025/dream-vlx/
• Github: https://github.com/DreamLM/Dream-VLX
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#VisionLanguageModels #DiffusionModels #Robotics #AI #ComputerVision
✨Act2Goal: From World Model To General Goal-conditioned Policy
📝 Summary:
Act2Goal is a new policy for robust long-horizon robotic manipulation. It uses a goal-conditioned visual world model with multi-scale temporal control to plan intermediate states and execute precisely. This allows strong generalization and rapid online adaptation, significantly boosting real-robo...
🔹 Publication Date: Published on Dec 29
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.23541
• PDF: https://arxiv.org/pdf/2512.23541
• Project Page: https://act2goal.github.io/
• Github: https://act2goal.github.io/
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#Robotics #AI #MachineLearning #WorldModels #ReinforcementLearning
📝 Summary:
Act2Goal is a new policy for robust long-horizon robotic manipulation. It uses a goal-conditioned visual world model with multi-scale temporal control to plan intermediate states and execute precisely. This allows strong generalization and rapid online adaptation, significantly boosting real-robo...
🔹 Publication Date: Published on Dec 29
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.23541
• PDF: https://arxiv.org/pdf/2512.23541
• Project Page: https://act2goal.github.io/
• Github: https://act2goal.github.io/
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#Robotics #AI #MachineLearning #WorldModels #ReinforcementLearning
Media is too big
VIEW IN TELEGRAM
✨GR-Dexter Technical Report
📝 Summary:
GR-Dexter introduces a hardware-model-data framework for bimanual dexterous-hand robot manipulation using VLA models. It combines a new 21-DoF hand, teleoperation for data, and diverse datasets. This framework achieves strong performance and robust generalization in real-world manipulation tasks.
🔹 Publication Date: Published on Dec 30, 2025
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.24210
• PDF: https://arxiv.org/pdf/2512.24210
• Project Page: https://byte-dexter.github.io/gr-dexter/
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#Robotics #DexterousManipulation #VLA #RobotHardware #MachineLearning
📝 Summary:
GR-Dexter introduces a hardware-model-data framework for bimanual dexterous-hand robot manipulation using VLA models. It combines a new 21-DoF hand, teleoperation for data, and diverse datasets. This framework achieves strong performance and robust generalization in real-world manipulation tasks.
🔹 Publication Date: Published on Dec 30, 2025
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.24210
• PDF: https://arxiv.org/pdf/2512.24210
• Project Page: https://byte-dexter.github.io/gr-dexter/
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#Robotics #DexterousManipulation #VLA #RobotHardware #MachineLearning
✨Dream2Flow: Bridging Video Generation and Open-World Manipulation with 3D Object Flow
📝 Summary:
Dream2Flow bridges video generation and robotic control using 3D object flow. It reconstructs 3D object motions from generated videos, enabling zero-shot manipulation of diverse objects through trajectory tracking without task-specific demonstrations.
🔹 Publication Date: Published on Dec 31, 2025
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.24766
• PDF: https://arxiv.org/pdf/2512.24766
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#VideoGeneration #Robotics #3DVision #AI #ZeroShotLearning
📝 Summary:
Dream2Flow bridges video generation and robotic control using 3D object flow. It reconstructs 3D object motions from generated videos, enabling zero-shot manipulation of diverse objects through trajectory tracking without task-specific demonstrations.
🔹 Publication Date: Published on Dec 31, 2025
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.24766
• PDF: https://arxiv.org/pdf/2512.24766
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#VideoGeneration #Robotics #3DVision #AI #ZeroShotLearning
✨RGS-SLAM: Robust Gaussian Splatting SLAM with One-Shot Dense Initialization
📝 Summary:
RGS-SLAM is a robust Gaussian-splatting SLAM framework that uses a one-shot, correspondence-to-Gaussian initialization with DINOv3 descriptors. This method improves stability, accelerates convergence, and yields higher rendering fidelity and accuracy compared to existing systems.
🔹 Publication Date: Published on Dec 28, 2025
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2601.00705
• PDF: https://arxiv.org/pdf/2601.00705
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#SLAM #GaussianSplatting #ComputerVision #Robotics #DeepLearning
📝 Summary:
RGS-SLAM is a robust Gaussian-splatting SLAM framework that uses a one-shot, correspondence-to-Gaussian initialization with DINOv3 descriptors. This method improves stability, accelerates convergence, and yields higher rendering fidelity and accuracy compared to existing systems.
🔹 Publication Date: Published on Dec 28, 2025
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2601.00705
• PDF: https://arxiv.org/pdf/2601.00705
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#SLAM #GaussianSplatting #ComputerVision #Robotics #DeepLearning
👍1
✨RoboVIP: Multi-View Video Generation with Visual Identity Prompting Augments Robot Manipulation
📝 Summary:
Collecting diverse robot manipulation data is challenging. This paper introduces visual identity prompting, using exemplar images to guide diffusion models for generating multi-view, temporally coherent data. This augmented data improves robot policy performance in both simulation and real-world ...
🔹 Publication Date: Published on Jan 8
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2601.05241
• PDF: https://arxiv.org/pdf/2601.05241
• Project Page: https://robovip.github.io/RoboVIP/
• Github: https://robovip.github.io/RoboVIP/
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#Robotics #AI #GenerativeAI #ComputerVision #MachineLearning
📝 Summary:
Collecting diverse robot manipulation data is challenging. This paper introduces visual identity prompting, using exemplar images to guide diffusion models for generating multi-view, temporally coherent data. This augmented data improves robot policy performance in both simulation and real-world ...
🔹 Publication Date: Published on Jan 8
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2601.05241
• PDF: https://arxiv.org/pdf/2601.05241
• Project Page: https://robovip.github.io/RoboVIP/
• Github: https://robovip.github.io/RoboVIP/
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#Robotics #AI #GenerativeAI #ComputerVision #MachineLearning