Data Science | Machine Learning with Python for Researchers
32.6K subscribers
3.33K photos
126 videos
23 files
3.55K links
ads: @HusseinSheikho

The Data Science and Python channel is for researchers and advanced programmers

Buy ads: https://telega.io/c/dataScienceT
Download Telegram
Emu3.5: Native Multimodal Models are World Learners

📝 Summary:
Emu3.5 is a large-scale multimodal world model predicting next states in vision and language. It uses reinforcement learning and Discrete Diffusion Adaptation for efficient inference, delivering strong performance in multimodal tasks and world exploration.

🔹 Publication Date: Published on Oct 30

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2510.26583
• PDF: https://arxiv.org/pdf/2510.26583
• Project Page: https://emu.world/
• Github: https://github.com/baaivision/Emu3.5

🔹 Models citing this paper:
https://huggingface.co/BAAI/Emu3.5
https://huggingface.co/BAAI/Emu3.5-Image
https://huggingface.co/BAAI/Emu3.5-VisionTokenizer

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#MultimodalAI #WorldModels #ReinforcementLearning #ComputerVision #NLP
WMPO: World Model-based Policy Optimization for Vision-Language-Action Models

📝 Summary:
WMPO is a pixel-based world-model framework for on-policy VLA reinforcement learning that avoids real-world interaction. It uses pixel predictions aligned with VLA features to boost sample efficiency, performance, self-correction, and generalization in robotic manipulation.

🔹 Publication Date: Published on Nov 12

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.09515
• PDF: https://arxiv.org/pdf/2511.09515
• Project Page: https://wm-po.github.io/
• Github: https://github.com/WM-PO/WMPO

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#ReinforcementLearning #VLAModels #WorldModels #Robotics #AI
1
PAN: A World Model for General, Interactable, and Long-Horizon World Simulation

📝 Summary:
PAN is a general interactable world model that predicts future states through high-quality action-conditioned video simulation. It uses a GLP architecture combining LLM-based latent dynamics with a video diffusion decoder for detailed long-term coherent results enabling reasoning and acting.

🔹 Publication Date: Published on Nov 12

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.09057
• PDF: https://arxiv.org/pdf/2511.09057

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#WorldModels #AI #Simulation #GenerativeAI #Robotics
1