✨Your Group-Relative Advantage Is Biased
📝 Summary:
Group-based Reinforcement Learning from Verifier Rewards has a biased advantage estimator, underestimating hard prompts and overestimating easy ones. This paper proposes History-Aware Adaptive Difficulty Weighting HA-DW to correct this bias, improving performance on reasoning tasks.
🔹 Publication Date: Published on Jan 13
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2601.08521
• PDF: https://arxiv.org/pdf/2601.08521
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#ReinforcementLearning #MachineLearning #AIResearch #BiasCorrection #ReasoningTasks
📝 Summary:
Group-based Reinforcement Learning from Verifier Rewards has a biased advantage estimator, underestimating hard prompts and overestimating easy ones. This paper proposes History-Aware Adaptive Difficulty Weighting HA-DW to correct this bias, improving performance on reasoning tasks.
🔹 Publication Date: Published on Jan 13
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2601.08521
• PDF: https://arxiv.org/pdf/2601.08521
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#ReinforcementLearning #MachineLearning #AIResearch #BiasCorrection #ReasoningTasks
❤1
✨RubricHub: A Comprehensive and Highly Discriminative Rubric Dataset via Automated Coarse-to-Fine Generation
📝 Summary:
This work presents an automated rubric generation framework and RubricHub dataset for open-ended AI generation. RubricHub enables significant performance gains, achieving state-of-the-art results on HealthBench and surpassing GPT-5.
🔹 Publication Date: Published on Jan 13
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2601.08430
• PDF: https://arxiv.org/pdf/2601.08430
• Project Page: https://huggingface.co/datasets/sojuL/RubricHub_v1
• Github: https://github.com/teqkilla/RubricHub
✨ Datasets citing this paper:
• https://huggingface.co/datasets/sojuL/RubricHub_v1
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#AI #GenerativeAI #MachineLearning #NLP #Dataset
📝 Summary:
This work presents an automated rubric generation framework and RubricHub dataset for open-ended AI generation. RubricHub enables significant performance gains, achieving state-of-the-art results on HealthBench and surpassing GPT-5.
🔹 Publication Date: Published on Jan 13
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2601.08430
• PDF: https://arxiv.org/pdf/2601.08430
• Project Page: https://huggingface.co/datasets/sojuL/RubricHub_v1
• Github: https://github.com/teqkilla/RubricHub
✨ Datasets citing this paper:
• https://huggingface.co/datasets/sojuL/RubricHub_v1
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#AI #GenerativeAI #MachineLearning #NLP #Dataset
✨BAPO: Boundary-Aware Policy Optimization for Reliable Agentic Search
📝 Summary:
Reinforcement learning framework for agentic search that improves reliability by teaching agents to recognize reasoning limits and respond appropriately when evidence is insufficient. AI-generated sum...
🔹 Publication Date: Published on Jan 16
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2601.11037
• PDF: https://arxiv.org/pdf/2601.11037
• Github: https://github.com/Liushiyu-0709/BAPO-Reliable-Search
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#AI #DataScience #MachineLearning #HuggingFace #Research
📝 Summary:
Reinforcement learning framework for agentic search that improves reliability by teaching agents to recognize reasoning limits and respond appropriately when evidence is insufficient. AI-generated sum...
🔹 Publication Date: Published on Jan 16
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2601.11037
• PDF: https://arxiv.org/pdf/2601.11037
• Github: https://github.com/Liushiyu-0709/BAPO-Reliable-Search
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#AI #DataScience #MachineLearning #HuggingFace #Research
✨ProFit: Leveraging High-Value Signals in SFT via Probability-Guided Token Selection
📝 Summary:
Supervised fine-tuning with multiple references addresses overfitting to non-core expressions by masking low-probability tokens based on their semantic importance. AI-generated summary Supervised fine...
🔹 Publication Date: Published on Jan 14
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2601.09195
• PDF: https://arxiv.org/pdf/2601.09195
• Github: https://github.com/Utaotao/ProFit
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#AI #DataScience #MachineLearning #HuggingFace #Research
📝 Summary:
Supervised fine-tuning with multiple references addresses overfitting to non-core expressions by masking low-probability tokens based on their semantic importance. AI-generated summary Supervised fine...
🔹 Publication Date: Published on Jan 14
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2601.09195
• PDF: https://arxiv.org/pdf/2601.09195
• Github: https://github.com/Utaotao/ProFit
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#AI #DataScience #MachineLearning #HuggingFace #Research
✨Reasoning Models Generate Societies of Thought
📝 Summary:
Reasoning models demonstrate enhanced performance through multi-agent-like interactions that create diverse cognitive perspectives and improve problem-solving through structured social organization. A...
🔹 Publication Date: Published on Jan 15
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2601.10825
• PDF: https://arxiv.org/pdf/2601.10825
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#AI #DataScience #MachineLearning #HuggingFace #Research
📝 Summary:
Reasoning models demonstrate enhanced performance through multi-agent-like interactions that create diverse cognitive perspectives and improve problem-solving through structured social organization. A...
🔹 Publication Date: Published on Jan 15
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2601.10825
• PDF: https://arxiv.org/pdf/2601.10825
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#AI #DataScience #MachineLearning #HuggingFace #Research
✨AstroReason-Bench: Evaluating Unified Agentic Planning across Heterogeneous Space Planning Problems
📝 Summary:
Recent advances in agentic Large Language Models (LLMs) have positioned them as generalist planners capable of reasoning and acting across diverse tasks. However, existing agent benchmarks largely foc...
🔹 Publication Date: Published on Jan 16
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2601.11354
• PDF: https://arxiv.org/pdf/2601.11354
• Github: https://github.com/Mtrya/astro-reason
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#AI #DataScience #MachineLearning #HuggingFace #Research
📝 Summary:
Recent advances in agentic Large Language Models (LLMs) have positioned them as generalist planners capable of reasoning and acting across diverse tasks. However, existing agent benchmarks largely foc...
🔹 Publication Date: Published on Jan 16
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2601.11354
• PDF: https://arxiv.org/pdf/2601.11354
• Github: https://github.com/Mtrya/astro-reason
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#AI #DataScience #MachineLearning #HuggingFace #Research
✨Monolith: Real Time Recommendation System With Collisionless Embedding Table
📝 Summary:
Monolith is a real-time recommendation system designed for online training. It features a collisionless embedding table with memory optimizations and a fault-tolerant architecture, enabling real-time learning by overcoming limitations of general DL frameworks.
🔹 Publication Date: Published on Sep 16, 2022
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2209.07663
• PDF: https://arxiv.org/pdf/2209.07663
• Github: https://github.com/bytedance/monolith
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#RecommendationSystems #DeepLearning #MachineLearning #RealTimeAI #DataScience
📝 Summary:
Monolith is a real-time recommendation system designed for online training. It features a collisionless embedding table with memory optimizations and a fault-tolerant architecture, enabling real-time learning by overcoming limitations of general DL frameworks.
🔹 Publication Date: Published on Sep 16, 2022
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2209.07663
• PDF: https://arxiv.org/pdf/2209.07663
• Github: https://github.com/bytedance/monolith
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#RecommendationSystems #DeepLearning #MachineLearning #RealTimeAI #DataScience
✨Agent Lightning: Train ANY AI Agents with Reinforcement Learning
📝 Summary:
Agent Lightning is a flexible RL framework for training LLMs in any AI agent, uniquely decoupling execution from training. It uses a hierarchical RL algorithm to handle complex interactions, enabling seamless integration with existing agents and showing stable improvements.
🔹 Publication Date: Published on Aug 5, 2025
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2508.03680
• PDF: https://arxiv.org/pdf/2508.03680
• Project Page: https://www.microsoft.com/en-us/research/project/agent-lightning/
• Github: https://github.com/microsoft/agent-lightning
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#AI #ReinforcementLearning #LLMs #AIAgents #MachineLearning
📝 Summary:
Agent Lightning is a flexible RL framework for training LLMs in any AI agent, uniquely decoupling execution from training. It uses a hierarchical RL algorithm to handle complex interactions, enabling seamless integration with existing agents and showing stable improvements.
🔹 Publication Date: Published on Aug 5, 2025
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2508.03680
• PDF: https://arxiv.org/pdf/2508.03680
• Project Page: https://www.microsoft.com/en-us/research/project/agent-lightning/
• Github: https://github.com/microsoft/agent-lightning
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#AI #ReinforcementLearning #LLMs #AIAgents #MachineLearning
✨SkyReels-V2: Infinite-length Film Generative Model
📝 Summary:
SkyReels-V2 is an infinite-length film generative model that overcomes video generation limits in duration and motion. It synergizes MLLMs, multi-stage pretraining, reinforcement learning, and a diffusion forcing framework. This enables high-quality, long-form video synthesis with realistic motion.
🔹 Publication Date: Published on Apr 17, 2025
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2504.13074
• PDF: https://arxiv.org/pdf/2504.13074
• Github: https://github.com/skyworkai/skyreels-v2
🔹 Models citing this paper:
• https://huggingface.co/Skywork/SkyReels-V2-I2V-14B-540P
• https://huggingface.co/Skywork/SkyCaptioner-V1
• https://huggingface.co/Skywork/SkyReels-V2-I2V-1.3B-540P
✨ Spaces citing this paper:
• https://huggingface.co/spaces/fffiloni/SkyReels-V2
• https://huggingface.co/spaces/svjack/SkyReels-V2
• https://huggingface.co/spaces/Dudu0043/SkyReels-V2
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#VideoGeneration #GenerativeAI #DiffusionModels #MachineLearning #AIResearch
📝 Summary:
SkyReels-V2 is an infinite-length film generative model that overcomes video generation limits in duration and motion. It synergizes MLLMs, multi-stage pretraining, reinforcement learning, and a diffusion forcing framework. This enables high-quality, long-form video synthesis with realistic motion.
🔹 Publication Date: Published on Apr 17, 2025
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2504.13074
• PDF: https://arxiv.org/pdf/2504.13074
• Github: https://github.com/skyworkai/skyreels-v2
🔹 Models citing this paper:
• https://huggingface.co/Skywork/SkyReels-V2-I2V-14B-540P
• https://huggingface.co/Skywork/SkyCaptioner-V1
• https://huggingface.co/Skywork/SkyReels-V2-I2V-1.3B-540P
✨ Spaces citing this paper:
• https://huggingface.co/spaces/fffiloni/SkyReels-V2
• https://huggingface.co/spaces/svjack/SkyReels-V2
• https://huggingface.co/spaces/Dudu0043/SkyReels-V2
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#VideoGeneration #GenerativeAI #DiffusionModels #MachineLearning #AIResearch
arXiv.org
SkyReels-V2: Infinite-length Film Generative Model
Recent advances in video generation have been driven by diffusion models and autoregressive frameworks, yet critical challenges persist in harmonizing prompt adherence, visual quality, motion...
✨ACoT-VLA: Action Chain-of-Thought for Vision-Language-Action Models
📝 Summary:
This paper introduces ACoT-VLA, a new paradigm for Vision-Language-Action models that enhances reasoning by formulating it as a structured sequence of coarse action intents. It uses explicit and implicit action reasoners to guide the final policy, significantly improving robot manipulation perfor...
🔹 Publication Date: Published on Jan 16
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2601.11404
• PDF: https://arxiv.org/pdf/2601.11404
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#AI #DataScience #MachineLearning #HuggingFace #Research
📝 Summary:
This paper introduces ACoT-VLA, a new paradigm for Vision-Language-Action models that enhances reasoning by formulating it as a structured sequence of coarse action intents. It uses explicit and implicit action reasoners to guide the final policy, significantly improving robot manipulation perfor...
🔹 Publication Date: Published on Jan 16
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2601.11404
• PDF: https://arxiv.org/pdf/2601.11404
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#AI #DataScience #MachineLearning #HuggingFace #Research