ML Research Hub

✨MOA: Multi-Objective Alignment for Role-Playing Agents

📝 Summary:
MOA is a reinforcement-learning framework for role-playing agents that uses multi-objective optimization and thought-augmented rollout. It simultaneously improves multiple skills like domain knowledge and linguistic style, addressing limitations of prior methods. MOA outperforms strong baselines,...

🔹 Publication Date: Published on Dec 10

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.09756
• PDF: https://arxiv.org/pdf/2512.09756

==================================

For more data science resources:
✓ https://t.iss.one/DataScienceT

#AI #ReinforcementLearning #MultiObjectiveOptimization #RolePlayingAgents #MachineLearning

❤1

326 views08:03

✨ Explore Data Science 📝 Write your paper

ML Research Hub

✨MiMo: Unlocking the Reasoning Potential of Language Model -- From Pretraining to Posttraining

📝 Summary:
MiMo-7B is a 7B LLM optimized for reasoning through pre-training with data mixing and Multi-Token Prediction. Post-training uses reinforcement learning on math and programming problems. This approach enables MiMo-7B to achieve superior reasoning performance, outperforming larger models and OpenAI...

🔹 Publication Date: Published on May 12

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2505.07608
• PDF: https://arxiv.org/pdf/2505.07608
• Github: https://github.com/XiaomiMiMo/MiMo

🔹 Models citing this paper:
• https://huggingface.co/XiaomiMiMo/MiMo-7B-RL
• https://huggingface.co/XiaomiMiMo/MiMo-7B-Base
• https://huggingface.co/XiaomiMiMo/MiMo-7B-RL-0530

✨ Spaces citing this paper:
• https://huggingface.co/spaces/ISEEKYAN/megatron_memory_estimator
• https://huggingface.co/spaces/ISEEKYAN/megatron_memory_estimator_old
• https://huggingface.co/spaces/sizzlebop/ZeroGPU-LLM-Inference

==================================

For more data science resources:
✓ https://t.iss.one/DataScienceT

#LLM #AI #ReinforcementLearning #MachineLearning #Reasoning

arXiv.org

MiMo: Unlocking the Reasoning Potential of Language Model -- From...

We present MiMo-7B, a large language model born for reasoning tasks, with optimization across both pre-training and post-training stages. During pre-training, we enhance the data preprocessing...

283 views12:05

✨ Explore Data Science 📝 Write your paper

ML Research Hub

✨JustRL: Scaling a 1.5B LLM with a Simple RL Recipe

📝 Summary:
JustRL uses a minimal single-stage RL approach with fixed hyperparameters to achieve state-of-the-art performance on 1.5B reasoning models. It uses less compute and shows stable training, suggesting that complex RL methods for LLMs may be unnecessary and can even hinder exploration.

🔹 Publication Date: Published on Dec 18

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.16649
• PDF: https://arxiv.org/pdf/2512.16649

==================================

For more data science resources:
✓ https://t.iss.one/DataScienceT

#ReinforcementLearning #LLMs #DeepLearning #AIResearch #ModelScaling

❤1

333 views12:06

✨ Explore Data Science 📝 Write your paper

ML Research Hub

✨MomaGraph: State-Aware Unified Scene Graphs with Vision-Language Model for Embodied Task Planning

📝 Summary:
MomaGraph-R1, a vision-language model trained with reinforcement learning, achieves state-of-the-art performance in predicting task-oriented scene graphs and zero-shot task planning in household envir...

🔹 Publication Date: Published on Dec 18

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.16909
• PDF: https://arxiv.org/pdf/2512.16909
• Github: https://hybridrobotics.github.io/MomaGraph/

==================================

For more data science resources:
✓ https://t.iss.one/DataScienceT

#VisionLanguageModel #EmbodiedAI #ReinforcementLearning #SceneGraphs #Robotics

❤2

411 views17:07

✨ Explore Data Science 📝 Write your paper

ML Research Hub

✨Seed-Prover 1.5: Mastering Undergraduate-Level Theorem Proving via Learning from Experience

📝 Summary:
Seed-Prover 1.5 is a formal theorem-proving model that uses agentic reinforcement learning and an efficient scaling workflow. It achieves superior performance in solving undergraduate, graduate, and PhD-level math problems with reduced computational resources. This demonstrates the potential of l...

🔹 Publication Date: Published on Dec 19

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.17260
• PDF: https://arxiv.org/pdf/2512.17260
• Github: https://github.com/ByteDance-Seed/Seed-Prover

==================================

For more data science resources:
✓ https://t.iss.one/DataScienceT

#TheoremProving #ReinforcementLearning #AI #Mathematics #AI4Math

❤2

237 views03:00

✨ Explore Data Science 📝 Write your paper

ML Research Hub

✨Turn-PPO: Turn-Level Advantage Estimation with PPO for Improved Multi-Turn RL in Agentic LLMs

📝 Summary:
Turn-PPO improves multi-turn reinforcement learning for LLM agents by using a turn-level MDP for advantage estimation. This PPO variant outperforms GRPO and standard PPO, addressing limitations in long-horizon reasoning. It demonstrates effectiveness on WebShop and Sokoban datasets.

🔹 Publication Date: Published on Dec 18

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.17008
• PDF: https://arxiv.org/pdf/2512.17008

==================================

For more data science resources:
✓ https://t.iss.one/DataScienceT

#LLM #ReinforcementLearning #AI #MachineLearning #AgenticAI

❤1

201 views04:02

✨ Explore Data Science 📝 Write your paper

ML Research Hub

0:20

This media is not supported in your browser

VIEW IN TELEGRAM

✨Meta-RL Induces Exploration in Language Agents

📝 Summary:
LaMer, a Meta-RL framework, enhances LLM agents exploration and adaptation in RL tasks. It significantly improves their performance and generalization across diverse environments, proving Meta-RLs effectiveness for robust adaptation in language agents.

🔹 Publication Date: Published on Dec 18

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.16848
• PDF: https://arxiv.org/pdf/2512.16848

==================================

For more data science resources:
✓ https://t.iss.one/DataScienceT

#MetaRL #LLMAgents #ReinforcementLearning #NLP #AI

387 views12:05

✨ Explore Data Science 📝 Write your paper

ML Research Hub

✨Memory-T1: Reinforcement Learning for Temporal Reasoning in Multi-session Agents

📝 Summary:
Memory-T1 is an RL framework improving temporal reasoning in long dialogues by selecting relevant sessions. It uses rewards for accuracy, evidence, and temporal consistency to achieve state-of-the-art performance on Time-Dialog and robustness to extensive histories.

🔹 Publication Date: Published on Dec 23

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.20092
• PDF: https://arxiv.org/pdf/2512.20092
• Github: https://github.com/Elvin-Yiming-Du/Memory-T1/

==================================

For more data science resources:
✓ https://t.iss.one/DataScienceT

#ReinforcementLearning #TemporalReasoning #NLP #DialogueSystems #AI

❤1

443 views16:04

✨ Explore Data Science 📝 Write your paper

ML Research Hub

✨Emergent temporal abstractions in autoregressive models enable hierarchical reinforcement learning

📝 Summary:
AR models face inefficient exploration and sparse rewards in RL. Internal RL uses a higher-order model to learn temporal abstraction controllers. This enables efficient learning from sparse rewards where standard RL fails.

🔹 Publication Date: Published on Dec 23

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.20605
• PDF: https://arxiv.org/pdf/2512.20605

==================================

For more data science resources:
✓ https://t.iss.one/DataScienceT

#ReinforcementLearning #HierarchicalRL #AutoregressiveModels #MachineLearning #ArtificialIntelligence

❤2

487 views13:58

✨ Explore Data Science 📝 Write your paper

ML Research Hub

✨MAI-UI Technical Report: Real-World Centric Foundation GUI Agents

📝 Summary:
MAI-UI introduces a family of foundation GUI agents tackling real-world deployment challenges. It uses a self-evolving data pipeline, device-cloud collaboration, and online RL to set new state-of-the-art in GUI grounding and mobile navigation, significantly boosting performance and privacy.

🔹 Publication Date: Published on Dec 26

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.22047
• PDF: https://arxiv.org/pdf/2512.22047

==================================

For more data science resources:
✓ https://t.iss.one/DataScienceT

#GUIAgents #AI #ReinforcementLearning #MobileTech #HCI

❤2

256 views03:00

✨ Explore Data Science 📝 Write your paper

About

Blog

Apps

Platform