This media is not supported in your browser
VIEW IN TELEGRAM
Adobe unveils HUMOTO, a high-quality #dataset of human-object interactions designed for #motiongeneration, #computervision, and #robotics. It features over 700 sequences (7,875 seconds @ 30FPS) with interactions involving 63 precisely modeled objects and 72 articulated partsβa rich resource for researchers and developers in the field.
#HUMOTO #4DMocap #HumanObjectInteraction #AdobeResearch #AI #MachineLearning #PoseEstimation
Please open Telegram to view this post
VIEW IN TELEGRAM
π5β€1π₯1
β¨RLinf-VLA: A Unified and Efficient Framework for VLA+RL Training
π Summary:
RLinf-VLA is a unified framework for scalable reinforcement learning training of vision-language-action models, overcoming supervised fine-tuning limitations. It offers a 1.6x-1.8x speedup, supports diverse architectures and algorithms, and shows strong generalization in simulation and on a real ...
πΉ Publication Date: Published on Oct 8
πΉ Paper Links:
β’ arXiv Page: https://arxiv.org/abs/2510.06710
β’ PDF: https://arxiv.org/pdf/2510.06710
β’ Project Page: https://rlinf.readthedocs.io/en/latest/
β’ Github: https://github.com/RLinf/RLinf
πΉ Models citing this paper:
β’ https://huggingface.co/RLinf/RLinf-math-7B
==================================
For more data science resources:
β https://t.iss.one/DataScienceT
#ReinforcementLearning #VLA #Robotics #AIResearch #MachineLearning
π Summary:
RLinf-VLA is a unified framework for scalable reinforcement learning training of vision-language-action models, overcoming supervised fine-tuning limitations. It offers a 1.6x-1.8x speedup, supports diverse architectures and algorithms, and shows strong generalization in simulation and on a real ...
πΉ Publication Date: Published on Oct 8
πΉ Paper Links:
β’ arXiv Page: https://arxiv.org/abs/2510.06710
β’ PDF: https://arxiv.org/pdf/2510.06710
β’ Project Page: https://rlinf.readthedocs.io/en/latest/
β’ Github: https://github.com/RLinf/RLinf
πΉ Models citing this paper:
β’ https://huggingface.co/RLinf/RLinf-math-7B
==================================
For more data science resources:
β https://t.iss.one/DataScienceT
#ReinforcementLearning #VLA #Robotics #AIResearch #MachineLearning
β¨RoboChallenge: Large-scale Real-robot Evaluation of Embodied Policies
π Summary:
RoboChallenge is an online evaluation system for robotic control algorithms, especially VLA models. It enables large-scale, reproducible real-robot testing to survey state-of-the-art models.
πΉ Publication Date: Published on Oct 20
πΉ Paper Links:
β’ arXiv Page: https://arxiv.org/abs/2510.17950
β’ PDF: https://arxiv.org/pdf/2510.17950
==================================
For more data science resources:
β https://t.iss.one/DataScienceT
#Robotics #AI #MachineLearning #EmbodiedAI #RoboticsEvaluation
π Summary:
RoboChallenge is an online evaluation system for robotic control algorithms, especially VLA models. It enables large-scale, reproducible real-robot testing to survey state-of-the-art models.
πΉ Publication Date: Published on Oct 20
πΉ Paper Links:
β’ arXiv Page: https://arxiv.org/abs/2510.17950
β’ PDF: https://arxiv.org/pdf/2510.17950
==================================
For more data science resources:
β https://t.iss.one/DataScienceT
#Robotics #AI #MachineLearning #EmbodiedAI #RoboticsEvaluation
β¨Kinematify: Open-Vocabulary Synthesis of High-DoF Articulated Objects
π Summary:
Kinematify is an automated framework that synthesizes high-DoF articulated objects from images or text. It infers kinematic topologies and estimates joint parameters, combining MCTS search with geometry-driven optimization for physically consistent models.
πΉ Publication Date: Published on Nov 3
πΉ Paper Links:
β’ arXiv Page: https://arxiv.org/abs/2511.01294
β’ PDF: https://arxiv.org/pdf/2511.01294
==================================
For more data science resources:
β https://t.iss.one/DataScienceT
#3DModeling #ComputerVision #Robotics #AIResearch #Kinematics
π Summary:
Kinematify is an automated framework that synthesizes high-DoF articulated objects from images or text. It infers kinematic topologies and estimates joint parameters, combining MCTS search with geometry-driven optimization for physically consistent models.
πΉ Publication Date: Published on Nov 3
πΉ Paper Links:
β’ arXiv Page: https://arxiv.org/abs/2511.01294
β’ PDF: https://arxiv.org/pdf/2511.01294
==================================
For more data science resources:
β https://t.iss.one/DataScienceT
#3DModeling #ComputerVision #Robotics #AIResearch #Kinematics
β¨Learning Vision-Driven Reactive Soccer Skills for Humanoid Robots
π Summary:
A unified reinforcement learning controller directly integrates visual perception and motion control for humanoid soccer robots. It uses extended Adversarial Motion Priors and an encoder-decoder to achieve reactive, coherent, and robust soccer skills in dynamic real-world environments.
πΉ Publication Date: Published on Nov 6
πΉ Paper Links:
β’ arXiv Page: https://arxiv.org/abs/2511.03996
β’ PDF: https://arxiv.org/pdf/2511.03996
β’ Project Page: https://humanoid-kick.github.io/
==================================
For more data science resources:
β https://t.iss.one/DataScienceT
#HumanoidRobots #ReinforcementLearning #Robotics #ComputerVision #AI
π Summary:
A unified reinforcement learning controller directly integrates visual perception and motion control for humanoid soccer robots. It uses extended Adversarial Motion Priors and an encoder-decoder to achieve reactive, coherent, and robust soccer skills in dynamic real-world environments.
πΉ Publication Date: Published on Nov 6
πΉ Paper Links:
β’ arXiv Page: https://arxiv.org/abs/2511.03996
β’ PDF: https://arxiv.org/pdf/2511.03996
β’ Project Page: https://humanoid-kick.github.io/
==================================
For more data science resources:
β https://t.iss.one/DataScienceT
#HumanoidRobots #ReinforcementLearning #Robotics #ComputerVision #AI
β€1
β¨OmniVinci: Enhancing Architecture and Data for Omni-Modal Understanding LLM
π Summary:
OmniVinci is an open-source omni-modal LLM that improves cross-modal understanding for audio, vision, and robotics. It features innovative architecture for better embedding alignment and temporal capture, along with efficient data curation. OmniVinci outperforms competitors while using significan...
πΉ Publication Date: Published on Oct 17
πΉ Paper Links:
β’ arXiv Page: https://arxivexplained.com/papers/omnivinci-enhancing-architecture-and-data-for-omni-modal-understanding-llm
β’ PDF: https://arxiv.org/pdf/2510.15870
β’ Project Page: https://nvlabs.github.io/OmniVinci/
β’ Github: https://github.com/NVlabs/OmniVinci
πΉ Models citing this paper:
β’ https://huggingface.co/nvidia/omnivinci
==================================
For more data science resources:
β https://t.iss.one/DataScienceT
#LLM #MultimodalAI #Robotics #DeepLearning #OpenSource
π Summary:
OmniVinci is an open-source omni-modal LLM that improves cross-modal understanding for audio, vision, and robotics. It features innovative architecture for better embedding alignment and temporal capture, along with efficient data curation. OmniVinci outperforms competitors while using significan...
πΉ Publication Date: Published on Oct 17
πΉ Paper Links:
β’ arXiv Page: https://arxivexplained.com/papers/omnivinci-enhancing-architecture-and-data-for-omni-modal-understanding-llm
β’ PDF: https://arxiv.org/pdf/2510.15870
β’ Project Page: https://nvlabs.github.io/OmniVinci/
β’ Github: https://github.com/NVlabs/OmniVinci
πΉ Models citing this paper:
β’ https://huggingface.co/nvidia/omnivinci
==================================
For more data science resources:
β https://t.iss.one/DataScienceT
#LLM #MultimodalAI #Robotics #DeepLearning #OpenSource
This media is not supported in your browser
VIEW IN TELEGRAM
β¨Robot Learning from a Physical World Model
π Summary:
PhysWorld enables robots to learn accurate manipulation from AI-generated videos by integrating video generation with physical world modeling. This approach grounds visual guidance into physically executable actions, eliminating the need for real robot data.
πΉ Publication Date: Published on Nov 10
πΉ Paper Links:
β’ arXiv Page: https://arxiv.org/abs/2511.07416
β’ PDF: https://arxiv.org/pdf/2511.07416
β’ Project Page: https://pointscoder.github.io/PhysWorld_Web/
β’ Github: https://github.com/PointsCoder/OpenReal2Sim
==================================
For more data science resources:
β https://t.iss.one/DataScienceT
#RobotLearning #Robotics #AI #PhysicalModeling #MachineLearning
π Summary:
PhysWorld enables robots to learn accurate manipulation from AI-generated videos by integrating video generation with physical world modeling. This approach grounds visual guidance into physically executable actions, eliminating the need for real robot data.
πΉ Publication Date: Published on Nov 10
πΉ Paper Links:
β’ arXiv Page: https://arxiv.org/abs/2511.07416
β’ PDF: https://arxiv.org/pdf/2511.07416
β’ Project Page: https://pointscoder.github.io/PhysWorld_Web/
β’ Github: https://github.com/PointsCoder/OpenReal2Sim
==================================
For more data science resources:
β https://t.iss.one/DataScienceT
#RobotLearning #Robotics #AI #PhysicalModeling #MachineLearning
β¨WMPO: World Model-based Policy Optimization for Vision-Language-Action Models
π Summary:
WMPO is a pixel-based world-model framework for on-policy VLA reinforcement learning that avoids real-world interaction. It uses pixel predictions aligned with VLA features to boost sample efficiency, performance, self-correction, and generalization in robotic manipulation.
πΉ Publication Date: Published on Nov 12
πΉ Paper Links:
β’ arXiv Page: https://arxiv.org/abs/2511.09515
β’ PDF: https://arxiv.org/pdf/2511.09515
β’ Project Page: https://wm-po.github.io/
β’ Github: https://github.com/WM-PO/WMPO
==================================
For more data science resources:
β https://t.iss.one/DataScienceT
#ReinforcementLearning #VLAModels #WorldModels #Robotics #AI
π Summary:
WMPO is a pixel-based world-model framework for on-policy VLA reinforcement learning that avoids real-world interaction. It uses pixel predictions aligned with VLA features to boost sample efficiency, performance, self-correction, and generalization in robotic manipulation.
πΉ Publication Date: Published on Nov 12
πΉ Paper Links:
β’ arXiv Page: https://arxiv.org/abs/2511.09515
β’ PDF: https://arxiv.org/pdf/2511.09515
β’ Project Page: https://wm-po.github.io/
β’ Github: https://github.com/WM-PO/WMPO
==================================
For more data science resources:
β https://t.iss.one/DataScienceT
#ReinforcementLearning #VLAModels #WorldModels #Robotics #AI
β€1
β¨AffordBot: 3D Fine-grained Embodied Reasoning via Multimodal Large Language Models
π Summary:
AffordBot uses MLLMs and chain-of-thought reasoning for fine-grained 3D embodied reasoning. It predicts affordance elements' location, motion type, and axis in 3D scenes per instructions. It achieves state-of-the-art by projecting 3D elements for 2D MLLMs.
πΉ Publication Date: Published on Nov 13
πΉ Paper Links:
β’ arXiv Page: https://arxiv.org/abs/2511.10017
β’ PDF: https://arxiv.org/pdf/2511.10017
==================================
For more data science resources:
β https://t.iss.one/DataScienceT
#AffordBot #MLLM #EmbodiedAI #3DReasoning #Robotics
π Summary:
AffordBot uses MLLMs and chain-of-thought reasoning for fine-grained 3D embodied reasoning. It predicts affordance elements' location, motion type, and axis in 3D scenes per instructions. It achieves state-of-the-art by projecting 3D elements for 2D MLLMs.
πΉ Publication Date: Published on Nov 13
πΉ Paper Links:
β’ arXiv Page: https://arxiv.org/abs/2511.10017
β’ PDF: https://arxiv.org/pdf/2511.10017
==================================
For more data science resources:
β https://t.iss.one/DataScienceT
#AffordBot #MLLM #EmbodiedAI #3DReasoning #Robotics
β¨PAN: A World Model for General, Interactable, and Long-Horizon World Simulation
π Summary:
PAN is a general interactable world model that predicts future states through high-quality action-conditioned video simulation. It uses a GLP architecture combining LLM-based latent dynamics with a video diffusion decoder for detailed long-term coherent results enabling reasoning and acting.
πΉ Publication Date: Published on Nov 12
πΉ Paper Links:
β’ arXiv Page: https://arxiv.org/abs/2511.09057
β’ PDF: https://arxiv.org/pdf/2511.09057
==================================
For more data science resources:
β https://t.iss.one/DataScienceT
#WorldModels #AI #Simulation #GenerativeAI #Robotics
π Summary:
PAN is a general interactable world model that predicts future states through high-quality action-conditioned video simulation. It uses a GLP architecture combining LLM-based latent dynamics with a video diffusion decoder for detailed long-term coherent results enabling reasoning and acting.
πΉ Publication Date: Published on Nov 12
πΉ Paper Links:
β’ arXiv Page: https://arxiv.org/abs/2511.09057
β’ PDF: https://arxiv.org/pdf/2511.09057
==================================
For more data science resources:
β https://t.iss.one/DataScienceT
#WorldModels #AI #Simulation #GenerativeAI #Robotics
β€1
This media is not supported in your browser
VIEW IN TELEGRAM
β¨PhysX-Anything: Simulation-Ready Physical 3D Assets from Single Image
π Summary:
PhysX-Anything generates simulation-ready physical 3D assets from single images, crucial for embodied AI. It uses a novel VLM-based model and an efficient 3D representation, enabling direct use in robotic policy learning.
πΉ Publication Date: Published on Nov 17
πΉ Paper Links:
β’ arXiv Page: https://arxiv.org/abs/2511.13648
β’ PDF: https://arxiv.org/pdf/2511.13648
β’ Project Page: https://physx-anything.github.io/
β’ Github: https://github.com/ziangcao0312/PhysX-Anything
β¨ Datasets citing this paper:
β’ https://huggingface.co/datasets/Caoza/PhysX-Mobility
==================================
For more data science resources:
β https://t.iss.one/DataScienceT
#EmbodiedAI #3DReconstruction #Robotics #ComputerVision #AIResearch
π Summary:
PhysX-Anything generates simulation-ready physical 3D assets from single images, crucial for embodied AI. It uses a novel VLM-based model and an efficient 3D representation, enabling direct use in robotic policy learning.
πΉ Publication Date: Published on Nov 17
πΉ Paper Links:
β’ arXiv Page: https://arxiv.org/abs/2511.13648
β’ PDF: https://arxiv.org/pdf/2511.13648
β’ Project Page: https://physx-anything.github.io/
β’ Github: https://github.com/ziangcao0312/PhysX-Anything
β¨ Datasets citing this paper:
β’ https://huggingface.co/datasets/Caoza/PhysX-Mobility
==================================
For more data science resources:
β https://t.iss.one/DataScienceT
#EmbodiedAI #3DReconstruction #Robotics #ComputerVision #AIResearch
β¨MiMo-Embodied: X-Embodied Foundation Model Technical Report
π Summary:
MiMo-Embodied is the first cross-embodied foundation model. It achieves state-of-the-art performance in both autonomous driving and embodied AI, demonstrating positive transfer through multi-stage learning and fine-tuning.
πΉ Publication Date: Published on Nov 20
πΉ Paper Links:
β’ arXiv Page: https://arxiv.org/abs/2511.16518
β’ PDF: https://arxiv.org/pdf/2511.16518
β’ Github: https://github.com/XiaomiMiMo/MiMo-Embodied
==================================
For more data science resources:
β https://t.iss.one/DataScienceT
#FoundationModels #EmbodiedAI #AutonomousDriving #AI #Robotics
π Summary:
MiMo-Embodied is the first cross-embodied foundation model. It achieves state-of-the-art performance in both autonomous driving and embodied AI, demonstrating positive transfer through multi-stage learning and fine-tuning.
πΉ Publication Date: Published on Nov 20
πΉ Paper Links:
β’ arXiv Page: https://arxiv.org/abs/2511.16518
β’ PDF: https://arxiv.org/pdf/2511.16518
β’ Github: https://github.com/XiaomiMiMo/MiMo-Embodied
==================================
For more data science resources:
β https://t.iss.one/DataScienceT
#FoundationModels #EmbodiedAI #AutonomousDriving #AI #Robotics
β¨MobiAgent: A Systematic Framework for Customizable Mobile Agents
π Summary:
MobiAgent is a comprehensive mobile agent system designed to improve real-world task execution accuracy and efficiency. It uses MobiMind models, the AgentRR framework, and MobiFlow benchmarking, plus an AI-assisted data collection pipeline. MobiAgent achieves state-of-the-art performance in mobil...
πΉ Publication Date: Published on Aug 30
πΉ Paper Links:
β’ arXiv Page: https://arxiv.org/abs/2509.00531
β’ PDF: https://arxiv.org/pdf/2509.00531
β’ Github: https://github.com/IPADS-SAI/MobiAgent/releases/download/v1.0/Mobiagent.apk
πΉ Models citing this paper:
β’ https://huggingface.co/IPADS-SAI/MobiMind-Grounder-3B
β’ https://huggingface.co/IPADS-SAI/MobiMind-Decider-7B
β’ https://huggingface.co/IPADS-SAI/MobiMind-Mixed-7B
==================================
For more data science resources:
β https://t.iss.one/DataScienceT
#MobileAgents #AI #DeepLearning #Robotics #Automation
π Summary:
MobiAgent is a comprehensive mobile agent system designed to improve real-world task execution accuracy and efficiency. It uses MobiMind models, the AgentRR framework, and MobiFlow benchmarking, plus an AI-assisted data collection pipeline. MobiAgent achieves state-of-the-art performance in mobil...
πΉ Publication Date: Published on Aug 30
πΉ Paper Links:
β’ arXiv Page: https://arxiv.org/abs/2509.00531
β’ PDF: https://arxiv.org/pdf/2509.00531
β’ Github: https://github.com/IPADS-SAI/MobiAgent/releases/download/v1.0/Mobiagent.apk
πΉ Models citing this paper:
β’ https://huggingface.co/IPADS-SAI/MobiMind-Grounder-3B
β’ https://huggingface.co/IPADS-SAI/MobiMind-Decider-7B
β’ https://huggingface.co/IPADS-SAI/MobiMind-Mixed-7B
==================================
For more data science resources:
β https://t.iss.one/DataScienceT
#MobileAgents #AI #DeepLearning #Robotics #Automation
β€1