ML Research Hub
32.9K subscribers
5.44K photos
344 videos
24 files
5.89K links
Advancing research in Machine Learning – practical insights, tools, and techniques for researchers.

Admin: @HusseinSheikho || @Hussein_Sheikho
Download Telegram
InftyThink+: Effective and Efficient Infinite-Horizon Reasoning via Reinforcement Learning

📝 Summary:
To address high costs and limits in chain-of-thought reasoning, InftyThink uses reinforcement learning to optimize iterative reasoning. It learns to strategically summarize and resume, boosting accuracy by 21% on AIME24, reducing latency, and improving generalization.

🔹 Publication Date: Published on Feb 6

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2602.06960
• PDF: https://arxiv.org/pdf/2602.06960

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#ReinforcementLearning #AIReasoning #ChainOfThought #ArtificialIntelligence #MachineLearning
Revisiting the Shape Convention of Transformer Language Models

📝 Summary:
This paper challenges the traditional narrow-wide-narrow FFN in Transformers, proposing deeper hourglass-shaped FFNs. This new design improves model efficiency and performance by better utilizing parameters, especially when expanding other model components.

🔹 Publication Date: Published on Feb 6

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2602.06471
• PDF: https://arxiv.org/pdf/2602.06471

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#Transformers #LLM #DeepLearning #NeuralNetworks #AIResearch
MemGUI-Bench: Benchmarking Memory of Mobile GUI Agents in Dynamic Environments

📝 Summary:
MemGUI-Bench is a new, comprehensive benchmark designed to evaluate the memory capabilities of mobile GUI agents. It addresses current benchmarks' failure to assess memory by offering a taxonomy, 128 tasks, and an automated evaluation pipeline. Experiments with state-of-the-art agents reveal sign...

🔹 Publication Date: Published on Feb 3

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2602.06075
• PDF: https://arxiv.org/pdf/2602.06075
• Project Page: https://lgy0404.github.io/MemGUI-Bench/
• Github: https://github.com/lgy0404/MemGUI-Bench

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#MobileAI #GUIagents #AIBenchmarking #MemoryAI #AIResearch
OmniMoE: An Efficient MoE by Orchestrating Atomic Experts at Scale

📝 Summary:
OmniMoE presents a system-algorithm co-designed framework that achieves fine-grained expert specialization in Mixture-of-Experts architectures through vector-level atomic experts and optimized routing...

🔹 Publication Date: Published on Feb 5

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2602.05711
• PDF: https://arxiv.org/pdf/2602.05711
• Github: https://github.com/flash-algo/omni-moe

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research
On the Entropy Dynamics in Reinforcement Fine-Tuning of Large Language Models

📝 Summary:
This paper theoretically analyzes entropy dynamics in reinforcement fine-tuning of large language models. It derives expressions for entropy change and proposes novel entropy control methods based on discriminant analysis, aiming to optimize the exploration-exploitation balance during LLM fine-tu...

🔹 Publication Date: Published on Feb 3

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2602.03392
• PDF: https://arxiv.org/pdf/2602.03392
• Github: https://github.com/agentscope-ai/Trinity-RFT

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#LLM #ReinforcementLearning #Entropy #AIResearch #MachineLearning
Self-Improving Multilingual Long Reasoning via Translation-Reasoning Integrated Training

📝 Summary:
TRIT framework improves multilingual long reasoning by jointly training translation and reasoning. This self-improving method enhances non-English question understanding and response generation without extra data. It boosts accuracy and language consistency, also improving cross-lingual question ...

🔹 Publication Date: Published on Feb 5

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2602.05940
• PDF: https://arxiv.org/pdf/2602.05940

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#MultilingualAI #LongReasoning #LLM #NLP #AIResearch
PlanViz: Evaluating Planning-Oriented Image Generation and Editing for Computer-Use Tasks

📝 Summary:
PlanViz is a new benchmark evaluating unified multimodal models for image generation and editing in computer-use planning tasks. It features route planning, work diagramming, and web&UI displaying sub-tasks, using a task-adaptive PlanScore to assess correctness, visual quality, and efficiency.

🔹 Publication Date: Published on Feb 6

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2602.06663
• PDF: https://arxiv.org/pdf/2602.06663
• Project Page: https://github.com/lijunxian111/PlanViz
• Github: https://github.com/lijunxian111/PlanViz/releases/tag/v1

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#MultimodalAI #ImageGeneration #ImageEditing #ComputerVision #Benchmarking
POINTS-GUI-G: GUI-Grounding Journey

📝 Summary:
GUI agents for automated digital tasks rely on vision-language models with enhanced grounding capabilities, achieved through refined data engineering, improved training strategies, and reinforcement l...

🔹 Publication Date: Published on Feb 6

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2602.06391
• PDF: https://arxiv.org/pdf/2602.06391
• Github: https://github.com/Tencent/POINTS-GUI

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research
EgoAVU: Egocentric Audio-Visual Understanding

📝 Summary:
MLLMs struggle with egocentric video's joint audio-visual understanding. EgoAVU, a new data engine, generates diverse audio-visual narrations to create the EgoAVU-Instruct dataset. This fine-tunes MLLMs, enabling up to 113% performance improvement in joint audio-visual comprehension.

🔹 Publication Date: Published on Feb 5

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2602.06139
• PDF: https://arxiv.org/pdf/2602.06139

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#EgocentricAI #MultimodalAI #AudioVisualAI #DeepLearning #Datasets
This media is not supported in your browser
VIEW IN TELEGRAM
DreamDojo: A Generalist Robot World Model from Large-Scale Human Videos

📝 Summary:
DreamDojo is a foundation world model trained on 44k hours of egocentric human videos that enables efficient simulation of dexterous robotic tasks through continuous latent actions and real-time disti...

🔹 Publication Date: Published on Feb 6

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2602.06949
• PDF: https://arxiv.org/pdf/2602.06949
• Project Page: https://dreamdojo-world.github.io/

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research
Back to Basics: Revisiting Exploration in Reinforcement Learning for LLM Reasoning via Generative Probabilities

📝 Summary:
ProGRPO is a novel RL method for LLM reasoning that tackles entropy collapse. It dynamically re-weights rewards to equilibrate confidence across correct responses, enhancing generative diversity and exploration. ProGRPO significantly outperforms standard methods on reasoning benchmarks.

🔹 Publication Date: Published on Feb 5

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2602.05281
• PDF: https://arxiv.org/pdf/2602.05281

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#ReinforcementLearning #LLM #AI #GenerativeAI #MachineLearning
Exploring Knowledge Purification in Multi-Teacher Knowledge Distillation for LLMs

📝 Summary:
This paper introduces Knowledge Purification, consolidating multi-teacher LLM rationales to reduce conflicts and improve distillation efficiency. Methods improve model performance and reduce conflicts; router-based methods generalize robustly.

🔹 Publication Date: Published on Feb 1

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2602.01064
• PDF: https://arxiv.org/pdf/2602.01064

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#LLM #KnowledgeDistillation #KnowledgePurification #AI #DeepLearning
Judging What We Cannot Solve: A Consequence-Based Approach for Oracle-Free Evaluation of Research-Level Math

📝 Summary:
Consequence-Based Utility evaluates math solutions by testing their value as in-context exemplars for related problems. This oracle-free approach outperforms reward models and LLM judges, improving ranking quality and correct-wrong separation of AI-generated solutions.

🔹 Publication Date: Published on Feb 6

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2602.06291
• PDF: https://arxiv.org/pdf/2602.06291

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#AIEvaluation #LLMEvaluation #MathAI #ArtificialIntelligence #MachineLearning
MSign: An Optimizer Preventing Training Instability in Large Language Models via Stable Rank Restoration

📝 Summary:
LLM training instability is linked to weight matrix stable rank decline and Jacobian alignment, causing gradient explosions. MSign is a new optimizer that restores stable rank via matrix sign operations, effectively preventing training failures with low computational overhead.

🔹 Publication Date: Published on Feb 2

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2602.01734
• PDF: https://arxiv.org/pdf/2602.01734

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research
Baichuan-M3: Modeling Clinical Inquiry for Reliable Medical Decision-Making

📝 Summary:
Baichuan-M3 is a medical LLM for clinical decision support. It uses proactive info gathering, long-horizon reasoning, and hallucination suppression. It outperforms GPT-5.2 on medical benchmarks in clinical inquiry and safety.

🔹 Publication Date: Published on Feb 6

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2602.06570
• PDF: https://arxiv.org/pdf/2602.06570
• Github: https://github.com/baichuan-inc/Baichuan-M3-235B

🔹 Models citing this paper:
https://huggingface.co/baichuan-inc/Baichuan-M3-235B
https://huggingface.co/baichuan-inc/Baichuan-M3-235B-GPTQ-INT4
https://huggingface.co/baichuan-inc/Baichuan-M3-235B-FP8

Spaces citing this paper:
https://huggingface.co/spaces/baichuan-inc/Baichuan-M3-Inquiry

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research
SEMA: Simple yet Effective Learning for Multi-Turn Jailbreak Attacks

📝 Summary:
A novel framework called SEMA is introduced that effectively trains multi-turn attackers for large language models without relying on existing strategies or external data, achieving state-of-the-art a...

🔹 Publication Date: Published on Feb 6

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2602.06854
• PDF: https://arxiv.org/pdf/2602.06854

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research
RaBiT: Residual-Aware Binarization Training for Accurate and Efficient LLMs

📝 Summary:
Residual binarization framework RaBiT addresses feature co-adaptation in quantized LLMs through hierarchical path derivation and robust initialization, achieving superior accuracy-efficiency trade-off...

🔹 Publication Date: Published on Feb 5

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2602.05367
• PDF: https://arxiv.org/pdf/2602.05367

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research
OdysseyArena: Benchmarking Large Language Models For Long-Horizon, Active and Inductive Interactions

📝 Summary:
OdysseyArena presents a new framework for evaluating large language models on long-horizon, inductive agent tasks that emphasize autonomous discovery of environmental transition laws. AI-generated sum...

🔹 Publication Date: Published on Feb 5

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2602.05843
• PDF: https://arxiv.org/pdf/2602.05843

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research
1
SeeUPO: Sequence-Level Agentic-RL with Convergence Guarantees

📝 Summary:
SeeUPO is a critic-free reinforcement learning method that ensures convergence guarantees in multi-turn agent interactions by modeling sequential decision-making as multi-agent bandit problems and usi...

🔹 Publication Date: Published on Feb 6

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2602.06554
• PDF: https://arxiv.org/pdf/2602.06554

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research
Forwarded from Machine Learning
🚀 Machine Learning Workflow: Step-by-Step Breakdown
Understanding the ML pipeline is essential to build scalable, production-grade models.

👉 Initial Dataset
Start with raw data. Apply cleaning, curation, and drop irrelevant or redundant features.
Example: Drop constant features or remove columns with 90% missing values.

👉 Exploratory Data Analysis (EDA)
Use mean, median, standard deviation, correlation, and missing value checks.
Techniques like PCA and LDA help with dimensionality reduction.
Example: Use PCA to reduce 50 features down to 10 while retaining 95% variance.

👉 Input Variables
Structured table with features like ID, Age, Income, Loan Status, etc.
Ensure numeric encoding and feature engineering are complete before training.

👉 Processed Dataset
Split the data into training (70%) and testing (30%) sets.
Example: Stratified sampling ensures target distribution consistency.

👉 Learning Algorithms
Apply algorithms like SVM, Logistic Regression, KNN, Decision Trees, or Ensemble models like Random Forest and Gradient Boosting.
Example: Use Random Forest to capture non-linear interactions in tabular data.

👉 Hyperparameter Optimization
Tune parameters using Grid Search or Random Search for better performance.
Example: Optimize max_depth and n_estimators in Gradient Boosting.

👉 Feature Selection
Use model-based importance ranking (e.g., from Random Forest) to remove noisy or irrelevant features.
Example: Drop features with zero importance to reduce overfitting.

👉 Model Training and Validation
Use cross-validation to evaluate generalization. Train final model on full training set.
Example: 5-fold cross-validation for reliable performance metrics.

👉 Model Evaluation
Use task-specific metrics:
- Classification – MCC, Sensitivity, Specificity, Accuracy
- Regression – RMSE, R², MSE
Example: For imbalanced classes, prefer MCC over simple accuracy.

💡 This workflow ensures models are robust, interpretable, and ready for deployment in real-world applications.

https://t.iss.one/DataScienceM
Avoiding Premature Collapse: Adaptive Annealing for Entropy-Regularized Structural Inference

📝 Summary:
Optimal transport models face premature mode collapse and instability during annealing as standard cooling is too fast. EPH-ASC, an adaptive algorithm, solves this by enforcing a linear stability law, preventing gradient explosions and stabilizing training.

🔹 Publication Date: Published on Jan 30

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2601.23039
• PDF: https://arxiv.org/pdf/2601.23039

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research