ML Research Hub
32.9K subscribers
5.38K photos
337 videos
24 files
5.81K links
Advancing research in Machine Learning – practical insights, tools, and techniques for researchers.

Admin: @HusseinSheikho || @Hussein_Sheikho
Download Telegram
GEPA: Reflective Prompt Evolution Can Outperform Reinforcement Learning

📝 Summary:
GEPA is a prompt optimizer that uses natural language reflection to learn high-level rules from trial and error. It significantly outperforms RL methods like GRPO and MIPROv2, achieving better performance with up to 35x fewer rollouts.

🔹 Publication Date: Published on Jul 25, 2025

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2507.19457
• PDF: https://arxiv.org/pdf/2507.19457
• Project Page: https://gepa-ai.github.io/gepa/
• Github: https://github.com/gepa-ai/gepa

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#PromptEngineering #ReinforcementLearning #ArtificialIntelligence #MachineLearning #NLP
2
NeST: Neuron Selective Tuning for LLM Safety

📝 Summary:
NeST is a lightweight LLM safety framework that selectively adapts a small subset of safety-relevant neurons. It significantly reduces unsafe generations by 90.2% with minimal trainable parameters, outperforming full fine-tuning and LoRA in safety performance and efficiency.

🔹 Publication Date: Published on Feb 18

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2602.16835
• PDF: https://arxiv.org/pdf/2602.16835

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#LLMSafety #LLM #AI #MachineLearning #DeepLearning
1
On the Mechanism and Dynamics of Modular Addition: Fourier Features, Lottery Ticket, and Grokking

📝 Summary:
Two-layer neural networks solve modular addition by learning Fourier features through phase symmetry and frequency diversification. This enables robust computation via majority voting to cancel noise. The process, including grokking, is explained by a lottery ticket mechanism and competition betw...

🔹 Publication Date: Published on Feb 18

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2602.16849
• PDF: https://arxiv.org/pdf/2602.16849
• Github: https://github.com/Y-Agent/modular-addition-feature-learning

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#NeuralNetworks #Grokking #FourierFeatures #LotteryTicket #MachineLearning
1
Calibrate-Then-Act: Cost-Aware Exploration in LLM Agents

📝 Summary:
LLM agents must balance exploration costs and uncertainty in complex sequential tasks. The Calibrate-Then-Act CTA framework provides LLMs with explicit cost-uncertainty context, enabling more optimal reasoning. This leads to better decision-making strategies in tasks like coding and information r...

🔹 Publication Date: Published on Feb 18

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2602.16699
• PDF: https://arxiv.org/pdf/2602.16699
• Github: https://github.com/Wenwen-D/env-explorer

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#LLMAgents #AIResearch #MachineLearning #CostAwareAI #DecisionMaking
1
CrispEdit: Low-Curvature Projections for Scalable Non-Destructive LLM Editing

📝 Summary:
CrispEdit is a scalable second-order LLM editing algorithm. It preserves capabilities by projecting updates into low-curvature subspaces using efficient Kronecker-factored approximations. This achieves high edit success with minimal capability degradation.

🔹 Publication Date: Published on Feb 17

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2602.15823
• PDF: https://arxiv.org/pdf/2602.15823
• Project Page: https://crispedit.github.io
• Github: https://github.com/zarifikram/CrispEdit

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#LLMEditing #LLMs #MachineLearning #AIResearch #DeepLearning
👍1
Modeling Distinct Human Interaction in Web Agents

📝 Summary:
This paper models distinct human intervention patterns in web agents to improve adaptability and collaboration. It identifies four interaction styles, training language models to predict user intervention with significantly improved accuracy. This approach leads to more useful and collaborative w...

🔹 Publication Date: Published on Feb 19

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2602.17588
• PDF: https://arxiv.org/pdf/2602.17588
• Project Page: https://cowcorpus.github.io/
• Github: https://github.com/oaishi/PlowPilot

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research
Hardware Co-Design Scaling Laws via Roofline Modelling for On-Device LLMs

📝 Summary:
A hardware-software co-design framework is proposed for on-device LLMs. It models training loss and uses roofline analysis to link accuracy and latency, speeding up architecture selection. This yields better performance on target hardware.

🔹 Publication Date: Published on Feb 10

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2602.10377
• PDF: https://arxiv.org/pdf/2602.10377

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research
2
Adapting Web Agents with Synthetic Supervision

📝 Summary:
Web agents struggle to adapt to new websites due to limited data and poor synthetic data quality. SynthAgent is a framework that refines AI-generated tasks and collected trajectories to create high-quality synthetic supervision. This approach significantly improves web agent adaptation.

🔹 Publication Date: Published on Nov 8, 2025

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.06101
• PDF: https://arxiv.org/pdf/2511.06101
• Project Page: https://github.com/aiming-lab/SynthAgent
• Github: https://github.com/aiming-lab/SynthAgent

🔹 Models citing this paper:
https://huggingface.co/ChilleD/SynthAgent-SFT-Qwen2.5-VL-7B
https://huggingface.co/ChilleD/SynthAgent-SFT-UI-TARS-1.5-7B
https://huggingface.co/ChilleD/SynthAgent

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#WebAgents #SyntheticData #MachineLearning #AIResearch #DeepLearning
1
Learning Smooth Time-Varying Linear Policies with an Action Jacobian Penalty

📝 Summary:
This paper proposes using an action Jacobian penalty to remove unrealistic high-frequency signals from reinforcement learning policies without tuning. It introduces a Linear Policy Net architecture to reduce computational overhead, enabling faster convergence and efficient inference for learning ...

🔹 Publication Date: Published on Feb 20

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2602.18312
• PDF: https://arxiv.org/pdf/2602.18312

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#ReinforcementLearning #MachineLearning #PolicyLearning #DeepLearning #AI
This media is not supported in your browser
VIEW IN TELEGRAM
EgoPush: Learning End-to-End Egocentric Multi-Object Rearrangement for Mobile Robots

📝 Summary:
EgoPush allows mobile robots to rearrange multiple objects in cluttered spaces using a single egocentric camera. It uses an object-centric latent space and stage-decomposed rewards for long-horizon tasks, outperforming end-to-end baselines and demonstrating sim-to-real transfer.

🔹 Publication Date: Published on Feb 20

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2602.18071
• PDF: https://arxiv.org/pdf/2602.18071
• Project Page: https://ai4ce.github.io/EgoPush/

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#Robotics #ComputerVision #AI #MachineLearning #RobotManipulation