Data Science | Machine Learning with Python for Researchers

✨Emu3.5: Native Multimodal Models are World Learners

📝 Summary:
Emu3.5 is a large-scale multimodal world model predicting next states in vision and language. It uses reinforcement learning and Discrete Diffusion Adaptation for efficient inference, delivering strong performance in multimodal tasks and world exploration.

🔹 Publication Date: Published on Oct 30

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2510.26583
• PDF: https://arxiv.org/pdf/2510.26583
• Project Page: https://emu.world/
• Github: https://github.com/baaivision/Emu3.5

🔹 Models citing this paper:
• https://huggingface.co/BAAI/Emu3.5
• https://huggingface.co/BAAI/Emu3.5-Image
• https://huggingface.co/BAAI/Emu3.5-VisionTokenizer

==================================

For more data science resources:
✓ https://t.iss.one/DataScienceT

#MultimodalAI #WorldModels #ReinforcementLearning #ComputerVision #NLP

88 views05:55

✨ Explore Data Science 📝 Write your paper

✨WMPO: World Model-based Policy Optimization for Vision-Language-Action Models

📝 Summary:
WMPO is a pixel-based world-model framework for on-policy VLA reinforcement learning that avoids real-world interaction. It uses pixel predictions aligned with VLA features to boost sample efficiency, performance, self-correction, and generalization in robotic manipulation.

🔹 Publication Date: Published on Nov 12

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.09515
• PDF: https://arxiv.org/pdf/2511.09515
• Project Page: https://wm-po.github.io/
• Github: https://github.com/WM-PO/WMPO

==================================

For more data science resources:
✓ https://t.iss.one/DataScienceT

#ReinforcementLearning #VLAModels #WorldModels #Robotics #AI

❤1

263 views04:01

✨ Explore Data Science 📝 Write your paper

Data Science | Machine Learning with Python for Researchers

✨PAN: A World Model for General, Interactable, and Long-Horizon World Simulation

📝 Summary:
PAN is a general interactable world model that predicts future states through high-quality action-conditioned video simulation. It uses a GLP architecture combining LLM-based latent dynamics with a video diffusion decoder for detailed long-term coherent results enabling reasoning and acting.

🔹 Publication Date: Published on Nov 12

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.09057
• PDF: https://arxiv.org/pdf/2511.09057

==================================

For more data science resources:
✓ https://t.iss.one/DataScienceT

#WorldModels #AI #Simulation #GenerativeAI #Robotics

❤1

382 views15:40

✨ Explore Data Science 📝 Write your paper

About

Blog

Apps

Platform