ML Research Hub
32.9K subscribers
4.7K photos
290 videos
24 files
5.07K links
Advancing research in Machine Learning – practical insights, tools, and techniques for researchers.

Admin: @HusseinSheikho || @Hussein_Sheikho
Download Telegram
Training Reasoning Models on Saturated Problems via Failure-Prefix Conditioning

📝 Summary:
Reinforcement learning training stalls on saturated problems as informative failures are hard to find. Failure-prefix conditioning addresses this by training on prefixes from rare incorrect reasoning paths, exposing models to failures. This boosts performance, maintains efficiency, and improves r...

🔹 Publication Date: Published on Jan 28

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2601.20829
• PDF: https://arxiv.org/pdf/2601.20829
• Github: https://github.com/minwukim/training-on-saturated-problems

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#ReinforcementLearning #MachineLearning #ArtificialIntelligence #DeepLearning #AIResearch
1
Group Distributionally Robust Optimization-Driven Reinforcement Learning for LLM Reasoning

📝 Summary:
This paper introduces Multi-Adversary GDRO to improve LLM reasoning. It dynamically adapts training distributions by classifying prompt difficulty and reallocating resources. This boosts accuracy by over 10% compared to GRPO, focusing compute on hard problems.

🔹 Publication Date: Published on Jan 27

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2601.19280
• PDF: https://arxiv.org/pdf/2601.19280

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#LLMReasoning #ReinforcementLearning #Optimization #MachineLearning #AI
1
FP8-RL: A Practical and Stable Low-Precision Stack for LLM Reinforcement Learning

📝 Summary:
FP8-RL presents a practical FP8 rollout stack for LLM reinforcement learning, addressing computational and memory bottlenecks. It employs blockwise FP8, KV-cache recalibration, and importance sampling to mitigate train-inference mismatch. This achieves up to 44% throughput gains while preserving ...

🔹 Publication Date: Published on Jan 26

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2601.18150
• PDF: https://arxiv.org/pdf/2601.18150

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#LLM #ReinforcementLearning #FP8 #MachineLearning #AIResearch
Language-based Trial and Error Falls Behind in the Era of Experience

📝 Summary:
LLMs struggle in nonlinguistic tasks due to costly exploration. SCOUT uses lightweight scouts for efficient exploration, then fine-tunes LLMs via SFT and RL. This boosts performance and saves GPU hours, outperforming proprietary models.

🔹 Publication Date: Published on Jan 29

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2601.21754
• PDF: https://arxiv.org/pdf/2601.21754
• Project Page: https://scout-cs.github.io/
• Github: https://github.com/Harry-mic/SCOUT

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research
Llama-3.1-FoundationAI-SecurityLLM-Reasoning-8B Technical Report

📝 Summary:
A two-stage trained cybersecurity reasoning model achieves competitive performance on specialized tasks while maintaining general capabilities through supervised fine-tuning and reinforcement learning...

🔹 Publication Date: Published on Jan 28

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2601.21051
• PDF: https://arxiv.org/pdf/2601.21051
• Project Page: https://huggingface.co/fdtn-ai/Foundation-Sec-8B-Reasoning

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research
VTC-R1: Vision-Text Compression for Efficient Long-Context Reasoning

📝 Summary:
VTC-R1 enables efficient long-context reasoning by compressing textual traces into compact images and iteratively feeding them back into vision-language models as optical memory, achieving significant...

🔹 Publication Date: Published on Jan 29

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2601.22069
• PDF: https://arxiv.org/pdf/2601.22069
• Github: https://github.com/w-yibo/VTC-R1

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research
Exploring Reasoning Reward Model for Agents

📝 Summary:
Agent-RRM, a multi-faceted reward model, provides structured feedback for agentic trajectories through reasoning traces, critiques, and performance scores, with unified feedback integration showing su...

🔹 Publication Date: Published on Jan 29

🔹 Paper Links:
• arXiv Page: https://arxiv.org/pdf/2601.22154
• PDF: https://arxiv.org/pdf/2601.22154
• Github: https://github.com/kxfan2002/Reagent

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research
Beyond Imitation: Reinforcement Learning for Active Latent Planning

📝 Summary:
Active latent planning method improves reasoning accuracy and efficiency by modeling latent token supervision as conditional VAE and using reinforcement learning with coherence rewards. AI-generated s...

🔹 Publication Date: Published on Jan 29

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2601.21598
• PDF: https://arxiv.org/pdf/2601.21598

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research
Generation Enhances Understanding in Unified Multimodal Models via Multi-Representation Generation

📝 Summary:
UniMRG enhances unified multimodal models by training them to generate multiple visual representations, improving both understanding and generation capabilities through complementary information captu...

🔹 Publication Date: Published on Jan 29

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2601.21406
• PDF: https://arxiv.org/pdf/2601.21406
• Github: https://github.com/Sugewud/UniMRG

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research