ML Research Hub
32.9K subscribers
4.45K photos
273 videos
23 files
4.81K links
Advancing research in Machine Learning – practical insights, tools, and techniques for researchers.

Admin: @HusseinSheikho || @Hussein_Sheikho
Download Telegram
GDPO: Group reward-Decoupled Normalization Policy Optimization for Multi-reward RL Optimization

📝 Summary:
GRPO in multi-reward RL suffers from reward normalization collapse, hindering training. GDPO resolves this by decoupling individual reward normalization, improving stability and accuracy. GDPO consistently outperforms GRPO across various reasoning tasks.

🔹 Publication Date: Published on Jan 8

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2601.05242
• PDF: https://arxiv.org/pdf/2601.05242
• Project Page: https://nvlabs.github.io/GDPO/
• Github: https://github.com/NVlabs/GDPO

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#ReinforcementLearning #MultiRewardRL #PolicyOptimization #MachineLearning #AI