ML Research Hub
32.8K subscribers
4.4K photos
272 videos
23 files
4.76K links
Advancing research in Machine Learning – practical insights, tools, and techniques for researchers.

Admin: @HusseinSheikho || @Hussein_Sheikho
Download Telegram
JustRL: Scaling a 1.5B LLM with a Simple RL Recipe

📝 Summary:
JustRL uses a minimal single-stage RL approach with fixed hyperparameters to achieve state-of-the-art performance on 1.5B reasoning models. It uses less compute and shows stable training, suggesting that complex RL methods for LLMs may be unnecessary and can even hinder exploration.

🔹 Publication Date: Published on Dec 18

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.16649
• PDF: https://arxiv.org/pdf/2512.16649

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#ReinforcementLearning #LLMs #DeepLearning #AIResearch #ModelScaling
1
MomaGraph: State-Aware Unified Scene Graphs with Vision-Language Model for Embodied Task Planning

📝 Summary:
MomaGraph-R1, a vision-language model trained with reinforcement learning, achieves state-of-the-art performance in predicting task-oriented scene graphs and zero-shot task planning in household envir...

🔹 Publication Date: Published on Dec 18

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.16909
• PDF: https://arxiv.org/pdf/2512.16909
• Github: https://hybridrobotics.github.io/MomaGraph/

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#VisionLanguageModel #EmbodiedAI #ReinforcementLearning #SceneGraphs #Robotics
2
Seed-Prover 1.5: Mastering Undergraduate-Level Theorem Proving via Learning from Experience

📝 Summary:
Seed-Prover 1.5 is a formal theorem-proving model that uses agentic reinforcement learning and an efficient scaling workflow. It achieves superior performance in solving undergraduate, graduate, and PhD-level math problems with reduced computational resources. This demonstrates the potential of l...

🔹 Publication Date: Published on Dec 19

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.17260
• PDF: https://arxiv.org/pdf/2512.17260
• Github: https://github.com/ByteDance-Seed/Seed-Prover

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#TheoremProving #ReinforcementLearning #AI #Mathematics #AI4Math
2
Turn-PPO: Turn-Level Advantage Estimation with PPO for Improved Multi-Turn RL in Agentic LLMs

📝 Summary:
Turn-PPO improves multi-turn reinforcement learning for LLM agents by using a turn-level MDP for advantage estimation. This PPO variant outperforms GRPO and standard PPO, addressing limitations in long-horizon reasoning. It demonstrates effectiveness on WebShop and Sokoban datasets.

🔹 Publication Date: Published on Dec 18

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.17008
• PDF: https://arxiv.org/pdf/2512.17008

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#LLM #ReinforcementLearning #AI #MachineLearning #AgenticAI
1
This media is not supported in your browser
VIEW IN TELEGRAM
Meta-RL Induces Exploration in Language Agents

📝 Summary:
LaMer, a Meta-RL framework, enhances LLM agents exploration and adaptation in RL tasks. It significantly improves their performance and generalization across diverse environments, proving Meta-RLs effectiveness for robust adaptation in language agents.

🔹 Publication Date: Published on Dec 18

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.16848
• PDF: https://arxiv.org/pdf/2512.16848

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#MetaRL #LLMAgents #ReinforcementLearning #NLP #AI
Memory-T1: Reinforcement Learning for Temporal Reasoning in Multi-session Agents

📝 Summary:
Memory-T1 is an RL framework improving temporal reasoning in long dialogues by selecting relevant sessions. It uses rewards for accuracy, evidence, and temporal consistency to achieve state-of-the-art performance on Time-Dialog and robustness to extensive histories.

🔹 Publication Date: Published on Dec 23

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.20092
• PDF: https://arxiv.org/pdf/2512.20092
• Github: https://github.com/Elvin-Yiming-Du/Memory-T1/

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#ReinforcementLearning #TemporalReasoning #NLP #DialogueSystems #AI
1
Emergent temporal abstractions in autoregressive models enable hierarchical reinforcement learning

📝 Summary:
AR models face inefficient exploration and sparse rewards in RL. Internal RL uses a higher-order model to learn temporal abstraction controllers. This enables efficient learning from sparse rewards where standard RL fails.

🔹 Publication Date: Published on Dec 23

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.20605
• PDF: https://arxiv.org/pdf/2512.20605

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#ReinforcementLearning #HierarchicalRL #AutoregressiveModels #MachineLearning #ArtificialIntelligence
2
MAI-UI Technical Report: Real-World Centric Foundation GUI Agents

📝 Summary:
MAI-UI introduces a family of foundation GUI agents tackling real-world deployment challenges. It uses a self-evolving data pipeline, device-cloud collaboration, and online RL to set new state-of-the-art in GUI grounding and mobile navigation, significantly boosting performance and privacy.

🔹 Publication Date: Published on Dec 26

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.22047
• PDF: https://arxiv.org/pdf/2512.22047

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#GUIAgents #AI #ReinforcementLearning #MobileTech #HCI
2
SWE-RM: Execution-free Feedback For Software Engineering Agents

📝 Summary:
This paper introduces SWE-RM, a robust, execution-free reward model for software engineering agents. It overcomes limitations of execution-based feedback, improving coding agent performance in both test-time scaling and reinforcement learning. SWE-RM achieves new state-of-the-art results for open...

🔹 Publication Date: Published on Dec 26

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.21919
• PDF: https://arxiv.org/pdf/2512.21919

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#SoftwareEngineering #AI #ReinforcementLearning #CodingAgents #RewardModels
1
Act2Goal: From World Model To General Goal-conditioned Policy

📝 Summary:
Act2Goal is a new policy for robust long-horizon robotic manipulation. It uses a goal-conditioned visual world model with multi-scale temporal control to plan intermediate states and execute precisely. This allows strong generalization and rapid online adaptation, significantly boosting real-robo...

🔹 Publication Date: Published on Dec 29

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.23541
• PDF: https://arxiv.org/pdf/2512.23541
• Project Page: https://act2goal.github.io/
• Github: https://act2goal.github.io/

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#Robotics #AI #MachineLearning #WorldModels #ReinforcementLearning
Youtu-Agent: Scaling Agent Productivity with Automated Generation and Hybrid Policy Optimization

📝 Summary:
Youtu-Agent scales LLM agent productivity, automating generation and enabling continuous evolution. Its hybrid optimization, using in-context learning and scalable reinforcement learning, yields top performance and boosted capabilities.

🔹 Publication Date: Published on Dec 31, 2025

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.24615
• PDF: https://arxiv.org/pdf/2512.24615
• Project Page: https://tencentcloudadp.github.io/youtu-agent/
• Github: https://github.com/TencentCloudADP/youtu-tip

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#LLM #AIAgents #ReinforcementLearning #MachineLearning #AI
SenseNova-MARS: Empowering Multimodal Agentic Reasoning and Search via Reinforcement Learning

📝 Summary:
SenseNova-MARS empowers Vision-Language Models with interleaved visual reasoning and dynamic tool use like search and cropping via reinforcement learning. It achieves state-of-the-art performance on complex visual tasks, outperforming proprietary models on new and existing benchmarks.

🔹 Publication Date: Published on Dec 30, 2025

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.24330
• PDF: https://arxiv.org/pdf/2512.24330
• Github: https://github.com/OpenSenseNova/SenseNova-MARS

Datasets citing this paper:
https://huggingface.co/datasets/sensenova/SenseNova-MARS-Data
https://huggingface.co/datasets/sensenova/HR-MMSearch

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#MultimodalAI #ReinforcementLearning #VisionLanguageModels #AgenticAI #ComputerVision
1
Diversity or Precision? A Deep Dive into Next Token Prediction

📝 Summary:
This paper proposes a pre-training objective that reshapes the token-output distribution for better RL exploration. It uses reward-shaping to balance diversity and precision in next-token prediction. Contrary to intuition, a precision-oriented prior surprisingly yields a superior exploration spac...

🔹 Publication Date: Published on Dec 28, 2025

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.22955
• PDF: https://arxiv.org/pdf/2512.22955

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#NextTokenPrediction #ReinforcementLearning #LLM #NLP #AIResearch
1
Taming Preference Mode Collapse via Directional Decoupling Alignment in Diffusion Reinforcement Learning

📝 Summary:
This paper addresses Preference Mode Collapse PMC in text-to-image diffusion models, where models lose diversity despite high reward scores. It introduces D^2-Align, a framework that mitigates PMC by directionally correcting the reward signal during optimization. This novel approach maintains gen...

🔹 Publication Date: Published on Dec 30, 2025

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.24146
• PDF: https://arxiv.org/pdf/2512.24146

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#DiffusionModels #ReinforcementLearning #GenerativeAI #MachineLearning #AIResearch
Unified Thinker: A General Reasoning Modular Core for Image Generation

📝 Summary:
Unified Thinker introduces a modular reasoning core for image generation, decoupling a Thinker from the generator. It uses reinforcement learning to optimize visual correctness, substantially improving image reasoning and generation quality.

🔹 Publication Date: Published on Jan 6

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2601.03127
• PDF: https://arxiv.org/pdf/2601.03127

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#ImageGeneration #AIResearch #ReinforcementLearning #DeepLearning #GenerativeAI
2
RL-AWB: Deep Reinforcement Learning for Auto White Balance Correction in Low-Light Night-time Scenes

📝 Summary:
RL-AWB is a novel framework combining statistical methods with deep reinforcement learning for improved nighttime auto white balance. It is the first RL approach for color constancy, mimicking expert tuning. This method shows superior generalization across various lighting conditions, and a new m...

🔹 Publication Date: Published on Jan 8

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2601.05249
• PDF: https://arxiv.org/pdf/2601.05249
• Project Page: https://ntuneillee.github.io/research/rl-awb/

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#ReinforcementLearning #ComputerVision #ImageProcessing #AutoWhiteBalance #LowLightImaging
2
One Sample to Rule Them All: Extreme Data Efficiency in RL Scaling

📝 Summary:
This paper demonstrates extreme data efficiency in RL for LLMs. A single, carefully designed training sample, called polymath learning, significantly enhances multidisciplinary reasoning, outperforming traditional methods that rely on large datasets. The findings suggest sample quality and design...

🔹 Publication Date: Published on Jan 6

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2601.03111
• PDF: https://arxiv.org/pdf/2601.03111

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#ReinforcementLearning #LLMs #DataEfficiency #AI #DeepLearning
1
GDPO: Group reward-Decoupled Normalization Policy Optimization for Multi-reward RL Optimization

📝 Summary:
GRPO in multi-reward RL suffers from reward normalization collapse, hindering training. GDPO resolves this by decoupling individual reward normalization, improving stability and accuracy. GDPO consistently outperforms GRPO across various reasoning tasks.

🔹 Publication Date: Published on Jan 8

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2601.05242
• PDF: https://arxiv.org/pdf/2601.05242
• Project Page: https://nvlabs.github.io/GDPO/
• Github: https://github.com/NVlabs/GDPO

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#ReinforcementLearning #MultiRewardRL #PolicyOptimization #MachineLearning #AI
RL-AWB: Deep Reinforcement Learning for Auto White Balance Correction in Low-Light Night-time Scenes

📝 Summary:
RL-AWB is a novel framework for nighttime auto white balance. It combines statistical methods with deep reinforcement learning, mimicking expert tuning to improve color constancy in low-light scenes. The method shows superior generalization across various lighting conditions and includes a new mu...

🔹 Publication Date: Published on Jan 8

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2601.05249
• PDF: https://arxiv.org/pdf/2601.05249
• Project Page: https://ntuneillee.github.io/research/rl-awb/
• Github: https://github.com/BrianChen1120/RL-AWB

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#ReinforcementLearning #DeepLearning #ComputerVision #ImageProcessing #AWB
AT^2PO: Agentic Turn-based Policy Optimization via Tree Search

📝 Summary:
AT^2PO is a framework for multi-turn agentic reinforcement learning. It uses a turn-level tree search with entropy-guided expansion and turn-wise credit assignment. This improves exploration, reward propagation, and policy optimization, achieving state-of-the-art results.

🔹 Publication Date: Published on Jan 8

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2601.04767
• PDF: https://arxiv.org/pdf/2601.04767
• Github: https://github.com/zzfoutofspace/ATPO

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#ReinforcementLearning #AgenticAI #TreeSearch #PolicyOptimization #ArtificialIntelligence