ML Research Hub
32.9K subscribers
5.36K photos
335 videos
24 files
5.79K links
Advancing research in Machine Learning – practical insights, tools, and techniques for researchers.

Admin: @HusseinSheikho || @Hussein_Sheikho
Download Telegram
Data Repetition Beats Data Scaling in Long-CoT Supervised Fine-Tuning

📝 Summary:
Training reasoning language models benefits from data repetition. For a fixed update budget, more epochs on smaller datasets beat single-pass training on larger datasets. Token accuracy signals optimal training duration.

🔹 Publication Date: Published on Feb 11

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2602.11149
• PDF: https://arxiv.org/pdf/2602.11149
• Github: https://github.com/dkopi/data-repetition

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#LLM #FineTuning #DataStrategy #MachineLearning #AIResearch