ML Research Hub

✨JustRL: Scaling a 1.5B LLM with a Simple RL Recipe

📝 Summary:
JustRL uses a minimal single-stage RL approach with fixed hyperparameters to achieve state-of-the-art performance on 1.5B reasoning models. It uses less compute and shows stable training, suggesting that complex RL methods for LLMs may be unnecessary and can even hinder exploration.

🔹 Publication Date: Published on Dec 18

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.16649
• PDF: https://arxiv.org/pdf/2512.16649

==================================

For more data science resources:
✓ https://t.iss.one/DataScienceT

#ReinforcementLearning #LLMs #DeepLearning #AIResearch #ModelScaling

❤1

333 views12:06

✨ Explore Data Science 📝 Write your paper

ML Research Hub

✨MomaGraph: State-Aware Unified Scene Graphs with Vision-Language Model for Embodied Task Planning

📝 Summary:
MomaGraph-R1, a vision-language model trained with reinforcement learning, achieves state-of-the-art performance in predicting task-oriented scene graphs and zero-shot task planning in household envir...

🔹 Publication Date: Published on Dec 18

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.16909
• PDF: https://arxiv.org/pdf/2512.16909
• Github: https://hybridrobotics.github.io/MomaGraph/

==================================

For more data science resources:
✓ https://t.iss.one/DataScienceT

#VisionLanguageModel #EmbodiedAI #ReinforcementLearning #SceneGraphs #Robotics

❤2

410 views17:07

✨ Explore Data Science 📝 Write your paper

ML Research Hub

✨Seed-Prover 1.5: Mastering Undergraduate-Level Theorem Proving via Learning from Experience

📝 Summary:
Seed-Prover 1.5 is a formal theorem-proving model that uses agentic reinforcement learning and an efficient scaling workflow. It achieves superior performance in solving undergraduate, graduate, and PhD-level math problems with reduced computational resources. This demonstrates the potential of l...

🔹 Publication Date: Published on Dec 19

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.17260
• PDF: https://arxiv.org/pdf/2512.17260
• Github: https://github.com/ByteDance-Seed/Seed-Prover

==================================

For more data science resources:
✓ https://t.iss.one/DataScienceT

#TheoremProving #ReinforcementLearning #AI #Mathematics #AI4Math

❤2

237 views03:00

✨ Explore Data Science 📝 Write your paper

ML Research Hub

✨Turn-PPO: Turn-Level Advantage Estimation with PPO for Improved Multi-Turn RL in Agentic LLMs

📝 Summary:
Turn-PPO improves multi-turn reinforcement learning for LLM agents by using a turn-level MDP for advantage estimation. This PPO variant outperforms GRPO and standard PPO, addressing limitations in long-horizon reasoning. It demonstrates effectiveness on WebShop and Sokoban datasets.

🔹 Publication Date: Published on Dec 18

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.17008
• PDF: https://arxiv.org/pdf/2512.17008

==================================

For more data science resources:
✓ https://t.iss.one/DataScienceT

#LLM #ReinforcementLearning #AI #MachineLearning #AgenticAI

❤1

199 views04:02

✨ Explore Data Science 📝 Write your paper

ML Research Hub

0:20

This media is not supported in your browser

VIEW IN TELEGRAM

✨Meta-RL Induces Exploration in Language Agents

📝 Summary:
LaMer, a Meta-RL framework, enhances LLM agents exploration and adaptation in RL tasks. It significantly improves their performance and generalization across diverse environments, proving Meta-RLs effectiveness for robust adaptation in language agents.

🔹 Publication Date: Published on Dec 18

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.16848
• PDF: https://arxiv.org/pdf/2512.16848

==================================

For more data science resources:
✓ https://t.iss.one/DataScienceT

#MetaRL #LLMAgents #ReinforcementLearning #NLP #AI

386 views12:05

✨ Explore Data Science 📝 Write your paper

ML Research Hub

✨Memory-T1: Reinforcement Learning for Temporal Reasoning in Multi-session Agents

📝 Summary:
Memory-T1 is an RL framework improving temporal reasoning in long dialogues by selecting relevant sessions. It uses rewards for accuracy, evidence, and temporal consistency to achieve state-of-the-art performance on Time-Dialog and robustness to extensive histories.

🔹 Publication Date: Published on Dec 23

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.20092
• PDF: https://arxiv.org/pdf/2512.20092
• Github: https://github.com/Elvin-Yiming-Du/Memory-T1/

==================================

For more data science resources:
✓ https://t.iss.one/DataScienceT

#ReinforcementLearning #TemporalReasoning #NLP #DialogueSystems #AI

❤1

443 views16:04

✨ Explore Data Science 📝 Write your paper

ML Research Hub

✨Emergent temporal abstractions in autoregressive models enable hierarchical reinforcement learning

📝 Summary:
AR models face inefficient exploration and sparse rewards in RL. Internal RL uses a higher-order model to learn temporal abstraction controllers. This enables efficient learning from sparse rewards where standard RL fails.

🔹 Publication Date: Published on Dec 23

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.20605
• PDF: https://arxiv.org/pdf/2512.20605

==================================

For more data science resources:
✓ https://t.iss.one/DataScienceT

#ReinforcementLearning #HierarchicalRL #AutoregressiveModels #MachineLearning #ArtificialIntelligence

❤2

487 views13:58

✨ Explore Data Science 📝 Write your paper

ML Research Hub

✨MAI-UI Technical Report: Real-World Centric Foundation GUI Agents

📝 Summary:
MAI-UI introduces a family of foundation GUI agents tackling real-world deployment challenges. It uses a self-evolving data pipeline, device-cloud collaboration, and online RL to set new state-of-the-art in GUI grounding and mobile navigation, significantly boosting performance and privacy.

🔹 Publication Date: Published on Dec 26

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.22047
• PDF: https://arxiv.org/pdf/2512.22047

==================================

For more data science resources:
✓ https://t.iss.one/DataScienceT

#GUIAgents #AI #ReinforcementLearning #MobileTech #HCI

❤2

255 views03:00

✨ Explore Data Science 📝 Write your paper

ML Research Hub

✨SWE-RM: Execution-free Feedback For Software Engineering Agents

📝 Summary:
This paper introduces SWE-RM, a robust, execution-free reward model for software engineering agents. It overcomes limitations of execution-based feedback, improving coding agent performance in both test-time scaling and reinforcement learning. SWE-RM achieves new state-of-the-art results for open...

🔹 Publication Date: Published on Dec 26

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.21919
• PDF: https://arxiv.org/pdf/2512.21919

==================================

For more data science resources:
✓ https://t.iss.one/DataScienceT

#SoftwareEngineering #AI #ReinforcementLearning #CodingAgents #RewardModels

❤1

281 views03:01

✨ Explore Data Science 📝 Write your paper

ML Research Hub

✨Act2Goal: From World Model To General Goal-conditioned Policy

📝 Summary:
Act2Goal is a new policy for robust long-horizon robotic manipulation. It uses a goal-conditioned visual world model with multi-scale temporal control to plan intermediate states and execute precisely. This allows strong generalization and rapid online adaptation, significantly boosting real-robo...

🔹 Publication Date: Published on Dec 29

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.23541
• PDF: https://arxiv.org/pdf/2512.23541
• Project Page: https://act2goal.github.io/
• Github: https://act2goal.github.io/

==================================

For more data science resources:
✓ https://t.iss.one/DataScienceT

#Robotics #AI #MachineLearning #WorldModels #ReinforcementLearning

126 views09:58

✨ Explore Data Science 📝 Write your paper

ML Research Hub

✨Youtu-Agent: Scaling Agent Productivity with Automated Generation and Hybrid Policy Optimization

📝 Summary:
Youtu-Agent scales LLM agent productivity, automating generation and enabling continuous evolution. Its hybrid optimization, using in-context learning and scalable reinforcement learning, yields top performance and boosted capabilities.

🔹 Publication Date: Published on Dec 31, 2025

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.24615
• PDF: https://arxiv.org/pdf/2512.24615
• Project Page: https://tencentcloudadp.github.io/youtu-agent/
• Github: https://github.com/TencentCloudADP/youtu-tip

==================================

For more data science resources:
✓ https://t.iss.one/DataScienceT

#LLM #AIAgents #ReinforcementLearning #MachineLearning #AI

334 views03:00

✨ Explore Data Science 📝 Write your paper

ML Research Hub

✨SenseNova-MARS: Empowering Multimodal Agentic Reasoning and Search via Reinforcement Learning

📝 Summary:
SenseNova-MARS empowers Vision-Language Models with interleaved visual reasoning and dynamic tool use like search and cropping via reinforcement learning. It achieves state-of-the-art performance on complex visual tasks, outperforming proprietary models on new and existing benchmarks.

🔹 Publication Date: Published on Dec 30, 2025

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.24330
• PDF: https://arxiv.org/pdf/2512.24330
• Github: https://github.com/OpenSenseNova/SenseNova-MARS

✨ Datasets citing this paper:
• https://huggingface.co/datasets/sensenova/SenseNova-MARS-Data
• https://huggingface.co/datasets/sensenova/HR-MMSearch

==================================

For more data science resources:
✓ https://t.iss.one/DataScienceT

#MultimodalAI #ReinforcementLearning #VisionLanguageModels #AgenticAI #ComputerVision

❤1

264 views03:00

✨ Explore Data Science 📝 Write your paper

ML Research Hub

✨Diversity or Precision? A Deep Dive into Next Token Prediction

📝 Summary:
This paper proposes a pre-training objective that reshapes the token-output distribution for better RL exploration. It uses reward-shaping to balance diversity and precision in next-token prediction. Contrary to intuition, a precision-oriented prior surprisingly yields a superior exploration spac...

🔹 Publication Date: Published on Dec 28, 2025

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.22955
• PDF: https://arxiv.org/pdf/2512.22955

==================================

For more data science resources:
✓ https://t.iss.one/DataScienceT

#NextTokenPrediction #ReinforcementLearning #LLM #NLP #AIResearch

❤1

494 views09:03

✨ Explore Data Science 📝 Write your paper

ML Research Hub

✨Taming Preference Mode Collapse via Directional Decoupling Alignment in Diffusion Reinforcement Learning

📝 Summary:
This paper addresses Preference Mode Collapse PMC in text-to-image diffusion models, where models lose diversity despite high reward scores. It introduces D^2-Align, a framework that mitigates PMC by directionally correcting the reward signal during optimization. This novel approach maintains gen...

🔹 Publication Date: Published on Dec 30, 2025

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.24146
• PDF: https://arxiv.org/pdf/2512.24146

==================================

For more data science resources:
✓ https://t.iss.one/DataScienceT

#DiffusionModels #ReinforcementLearning #GenerativeAI #MachineLearning #AIResearch

256 views02:00

✨ Explore Data Science 📝 Write your paper

ML Research Hub

✨Unified Thinker: A General Reasoning Modular Core for Image Generation

📝 Summary:
Unified Thinker introduces a modular reasoning core for image generation, decoupling a Thinker from the generator. It uses reinforcement learning to optimize visual correctness, substantially improving image reasoning and generation quality.

🔹 Publication Date: Published on Jan 6

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2601.03127
• PDF: https://arxiv.org/pdf/2601.03127

==================================

For more data science resources:
✓ https://t.iss.one/DataScienceT

#ImageGeneration #AIResearch #ReinforcementLearning #DeepLearning #GenerativeAI

❤2

353 views17:07

✨ Explore Data Science 📝 Write your paper

ML Research Hub

✨RL-AWB: Deep Reinforcement Learning for Auto White Balance Correction in Low-Light Night-time Scenes

📝 Summary:
RL-AWB is a novel framework combining statistical methods with deep reinforcement learning for improved nighttime auto white balance. It is the first RL approach for color constancy, mimicking expert tuning. This method shows superior generalization across various lighting conditions, and a new m...

🔹 Publication Date: Published on Jan 8

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2601.05249
• PDF: https://arxiv.org/pdf/2601.05249
• Project Page: https://ntuneillee.github.io/research/rl-awb/

==================================

For more data science resources:
✓ https://t.iss.one/DataScienceT

#ReinforcementLearning #ComputerVision #ImageProcessing #AutoWhiteBalance #LowLightImaging

❤2

213 views08:03

✨ Explore Data Science 📝 Write your paper

ML Research Hub

✨One Sample to Rule Them All: Extreme Data Efficiency in RL Scaling

📝 Summary:
This paper demonstrates extreme data efficiency in RL for LLMs. A single, carefully designed training sample, called polymath learning, significantly enhances multidisciplinary reasoning, outperforming traditional methods that rely on large datasets. The findings suggest sample quality and design...

🔹 Publication Date: Published on Jan 6

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2601.03111
• PDF: https://arxiv.org/pdf/2601.03111

==================================

For more data science resources:
✓ https://t.iss.one/DataScienceT

#ReinforcementLearning #LLMs #DataEfficiency #AI #DeepLearning

❤1

256 views14:06

✨ Explore Data Science 📝 Write your paper

ML Research Hub

✨GDPO: Group reward-Decoupled Normalization Policy Optimization for Multi-reward RL Optimization

📝 Summary:
GRPO in multi-reward RL suffers from reward normalization collapse, hindering training. GDPO resolves this by decoupling individual reward normalization, improving stability and accuracy. GDPO consistently outperforms GRPO across various reasoning tasks.

🔹 Publication Date: Published on Jan 8

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2601.05242
• PDF: https://arxiv.org/pdf/2601.05242
• Project Page: https://nvlabs.github.io/GDPO/
• Github: https://github.com/NVlabs/GDPO

==================================

For more data science resources:
✓ https://t.iss.one/DataScienceT

#ReinforcementLearning #MultiRewardRL #PolicyOptimization #MachineLearning #AI

298 views09:34

✨ Explore Data Science 📝 Write your paper

ML Research Hub

✨RL-AWB: Deep Reinforcement Learning for Auto White Balance Correction in Low-Light Night-time Scenes

📝 Summary:
RL-AWB is a novel framework for nighttime auto white balance. It combines statistical methods with deep reinforcement learning, mimicking expert tuning to improve color constancy in low-light scenes. The method shows superior generalization across various lighting conditions and includes a new mu...

🔹 Publication Date: Published on Jan 8

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2601.05249
• PDF: https://arxiv.org/pdf/2601.05249
• Project Page: https://ntuneillee.github.io/research/rl-awb/
• Github: https://github.com/BrianChen1120/RL-AWB

==================================

For more data science resources:
✓ https://t.iss.one/DataScienceT

#ReinforcementLearning #DeepLearning #ComputerVision #ImageProcessing #AWB

114 views09:34

✨ Explore Data Science 📝 Write your paper

ML Research Hub

✨AT^2PO: Agentic Turn-based Policy Optimization via Tree Search

📝 Summary:
AT^2PO is a framework for multi-turn agentic reinforcement learning. It uses a turn-level tree search with entropy-guided expansion and turn-wise credit assignment. This improves exploration, reward propagation, and policy optimization, achieving state-of-the-art results.

🔹 Publication Date: Published on Jan 8

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2601.04767
• PDF: https://arxiv.org/pdf/2601.04767
• Github: https://github.com/zzfoutofspace/ATPO

==================================

For more data science resources:
✓ https://t.iss.one/DataScienceT

#ReinforcementLearning #AgenticAI #TreeSearch #PolicyOptimization #ArtificialIntelligence

84 views09:35

✨ Explore Data Science 📝 Write your paper

About

Blog

Apps

Platform