ML Research Hub
32.9K subscribers
5.36K photos
332 videos
24 files
5.79K links
Advancing research in Machine Learning – practical insights, tools, and techniques for researchers.

Admin: @HusseinSheikho || @Hussein_Sheikho
Download Telegram
Learning Native Continuation for Action Chunking Flow Policies

📝 Summary:
Legato improves action-chunked Vision Language Action models by using training-time continuation methods that ensure smooth trajectories and reduce multimodal switching during real-time execution. AI-...

🔹 Publication Date: Published on Feb 13

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2602.12978
• PDF: https://arxiv.org/pdf/2602.12978
• Project Page: https://lyfeng001.github.io/Legato/

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research
Prescriptive Scaling Reveals the Evolution of Language Model Capabilities

📝 Summary:
Large-scale observational analysis estimates capability boundaries and performance predictions for foundation models using quantile regression and evaluates temporal stability across tasks. AI-generat...

🔹 Publication Date: Published on Feb 17

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2602.15327
• PDF: https://arxiv.org/pdf/2602.15327
• Project Page: https://jkjin.com/prescriptive-scaling

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research
COMPOT: Calibration-Optimized Matrix Procrustes Orthogonalization for Transformers Compression

📝 Summary:
COMPOT is a training-free Transformer compression framework. It uses sparse dictionary learning with orthogonal dictionaries and closed-form updates, outperforming traditional low-rank methods. This results in a superior quality-compression trade-off by also adaptively allocating layer-wise compr...

🔹 Publication Date: Published on Feb 16

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2602.15200
• PDF: https://arxiv.org/pdf/2602.15200

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#Transformers #ModelCompression #DeepLearning #AIResearch #Optimization
2
UniT: Unified Multimodal Chain-of-Thought Test-time Scaling

📝 Summary:
UniT introduces a framework for unified multimodal models to perform iterative chain-of-thought reasoning and refinement. This test-time scaling substantially improves generation and understanding, generalizing to longer inference chains.

🔹 Publication Date: Published on Feb 12

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2602.12279
• PDF: https://arxiv.org/pdf/2602.12279

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#MultimodalAI #ChainOfThought #AIResearch #MachineLearning #GenerativeAI
Causal-JEPA: Learning World Models through Object-Level Latent Interventions

📝 Summary:
C-JEPA is an object-centric world model extending masked joint embedding prediction. It uses object-level masking to induce latent interventions, forcing interaction reasoning and preventing shortcuts. This improves visual question answering, counterfactual reasoning, and efficient agent control ...

🔹 Publication Date: Published on Feb 11

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2602.11389
• PDF: https://arxiv.org/pdf/2602.11389
• Github: https://github.com/galilai-group/cjepa

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#AI #MachineLearning #WorldModels #Causality #DeepLearning
Sanity Checks for Sparse Autoencoders: Do SAEs Beat Random Baselines?

📝 Summary:
Sparse Autoencoders SAEs do not reliably decompose neural network internals despite strong reconstruction. On synthetic data, SAEs recovered only 9% of true features. Random baselines matched fully-trained SAEs in interpretability and downstream tasks, suggesting current SAEs fail their core purp...

🔹 Publication Date: Published on Feb 15

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2602.14111
• PDF: https://arxiv.org/pdf/2602.14111

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#SparseAutoencoders #AIResearch #MachineLearning #NeuralNetworks #Interpretability
1
SkillsBench: Benchmarking How Well Agent Skills Work Across Diverse Tasks

📝 Summary:
SkillsBench shows curated agent skills significantly boost performance, though inconsistently. Models struggle to create useful skills themselves, as self-generated skills provide no benefit. Focused skills are more effective.

🔹 Publication Date: Published on Feb 13

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2602.12670
• PDF: https://arxiv.org/pdf/2602.12670
• Project Page: https://skillsbench.ai/
• Github: https://github.com/benchflow-ai/skillsbench

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#AgentSkills #AI #Benchmarking #MachineLearning #LLMAgents
jina-embeddings-v5-text: Task-Targeted Embedding Distillation

📝 Summary:
This paper introduces a novel training regimen for compact text embedding models. It combines distillation with task-specific contrastive loss to achieve state-of-the-art performance for small models. The resulting jina-embeddings-v5-text models support long contexts and robust quantization.

🔹 Publication Date: Published on Feb 17

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2602.15547
• PDF: https://arxiv.org/pdf/2602.15547

🔹 Models citing this paper:
https://huggingface.co/jinaai/jina-embeddings-v5-text-small
https://huggingface.co/jinaai/jina-embeddings-v5-text-nano

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#TextEmbeddings #MachineLearning #NLP #ModelDistillation #DeepLearning
TAROT: Test-driven and Capability-adaptive Curriculum Reinforcement Fine-tuning for Code Generation with Large Language Models

📝 Summary:
TAROT proposes a reinforcement fine-tuning method for code generation that uses a four-tier test suite and capability-adaptive curriculum. This approach tailors curriculum progression based on a models skill, improving functional correctness and robustness.

🔹 Publication Date: Published on Feb 17

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2602.15449
• PDF: https://arxiv.org/pdf/2602.15449
• Github: https://github.com/deep-diver/TAROT

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#LLM #CodeGeneration #ReinforcementLearning #AI #MachineLearning
Detecting Overflow in Compressed Token Representations for Retrieval-Augmented Generation

📝 Summary:
Soft compression for LLMs can lead to token overflow, losing vital information. This paper proposes query-aware probing classifiers that detect overflow with 0.72 AUC-ROC, improving upon query-agnostic methods. This enables low-cost pre-LLM gating to mitigate compression-induced errors.

🔹 Publication Date: Published on Feb 12

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2602.12235
• PDF: https://arxiv.org/pdf/2602.12235

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#LLMs #RAG #NLP #AIResearch #TokenCompression
A Trajectory-Based Safety Audit of Clawdbot (OpenClaw)

📝 Summary:
Clawdbot, a self-hosted AI agent, exhibits a non-uniform safety profile. It reliably performs specified tasks but struggles with ambiguous inputs, open-ended goals, or jailbreaks, escalating minor misinterpretations into risky tool actions.

🔹 Publication Date: Published on Feb 16

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2602.14364
• PDF: https://arxiv.org/pdf/2602.14364
• Github: https://github.com/tychenn/clawdbot_report

Datasets citing this paper:
https://huggingface.co/datasets/tianyyuu/clawdbot_safety_testing

Spaces citing this paper:
https://huggingface.co/spaces/tianyyuu/clawdbot-safety-audit-explorer

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#AISafety #AIagent #Robotics #AIaudit #MachineLearning
This media is not supported in your browser
VIEW IN TELEGRAM
MIND: Benchmarking Memory Consistency and Action Control in World Models

📝 Summary:
MIND is the first open-domain, closed-loop benchmark for evaluating world model abilities like memory consistency and action control. It uses high-quality videos and various action spaces, uncovering current models struggles with long-term memory and action generalization.

🔹 Publication Date: Published on Feb 8

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2602.08025
• PDF: https://arxiv.org/pdf/2602.08025
• Project Page: https://csu-jpg.github.io/MIND.github.io/
• Github: https://github.com/CSU-JPG/MIND

Datasets citing this paper:
https://huggingface.co/datasets/CSU-JPG/MIND

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research
This media is not supported in your browser
VIEW IN TELEGRAM
Revisiting the Platonic Representation Hypothesis: An Aristotelian View

📝 Summary:
A new calibration framework corrects inflated neural network similarity scores. It reveals global convergence vanishes, while local neighborhood similarity persists, supporting the Aristotelian Representation Hypothesis of shared local relationships.

🔹 Publication Date: Published on Feb 16

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2602.14486
• PDF: https://arxiv.org/pdf/2602.14486
• Project Page: https://brbiclab.epfl.ch/projects/aristotelian/
• Github: https://github.com/mlbio-epfl/aristotelian

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research
HLE-Verified: A Systematic Verification and Structured Revision of Humanity's Last Exam

📝 Summary:
HLE-Verified systematically validates and revises the HLE benchmark, resolving noisy items through expert review and model-based checks. This improves language model evaluation accuracy by 7-10 percentage points, especially on erroneous items, enabling more reliable measurement of model capabilit...

🔹 Publication Date: Published on Feb 15

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2602.13964
• PDF: https://arxiv.org/pdf/2602.13964

Datasets citing this paper:
https://huggingface.co/datasets/skylenage/HLE-Verified

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#LLMEvaluation #Benchmarking #LanguageModels #AIResearch #NLP
Panini: Continual Learning in Token Space via Structured Memory

📝 Summary:
Panini is a continual learning framework storing knowledge in generative semantic workspaces to improve language model reasoning. It achieves 5-7 percent better performance using far fewer tokens and reduces unsupported answers for efficient, accurate retrieval.

🔹 Publication Date: Published on Feb 16

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2602.15156
• PDF: https://arxiv.org/pdf/2602.15156
• Github: https://github.com/roychowdhuryresearch/gsw-memory

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research
The Vision Wormhole: Latent-Space Communication in Heterogeneous Multi-Agent Systems

📝 Summary:
A Vision Wormhole framework enables efficient, model-agnostic communication in multi-agent systems by using visual-language models to transfer reasoning states through a shared latent space, reducing ...

🔹 Publication Date: Published on Feb 17

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2602.15382
• PDF: https://arxiv.org/pdf/2602.15382
• Github: https://github.com/xz-liu/heterogeneous-latent-mas

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research
👍1
How Much Reasoning Do Retrieval-Augmented Models Add beyond LLMs? A Benchmarking Framework for Multi-Hop Inference over Hybrid Knowledge

📝 Summary:
HybridRAG-Bench evaluates models multi-hop reasoning over hybrid knowledge. It uses recent scientific literature to create contamination-aware benchmarks, distinguishing genuine retrieval and reasoning from parametric recall.

🔹 Publication Date: Published on Feb 10

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2602.10210
• PDF: https://arxiv.org/pdf/2602.10210
• Project Page: https://junhongmit.github.io/HybridRAG-Bench/
• Github: https://github.com/junhongmit/HybridRAG-Bench

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research
Comparing AI Agents to Cybersecurity Professionals in Real-World Penetration Testing

📝 Summary:
ARTEMIS, a multi-agent framework, outperforms human cybersecurity professionals in vulnerability discovery and submission quality in an enterprise environment. AI-generated summary We present the firs...

🔹 Publication Date: Published on Dec 10, 2025

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.09882
• PDF: https://arxiv.org/pdf/2512.09882
• Project Page: https://trinity.cs.stanford.edu
• Github: https://github.com/Stanford-Trinity/ARTEMIS

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research
A decoder-only foundation model for time-series forecasting

📝 Summary:
A decoder-only foundation model is developed for time-series forecasting. Pretrained on a large corpus, this patched-decoder attention model delivers near state-of-the-art zero-shot performance across diverse datasets, time scales, and granularities.

🔹 Publication Date: Published on Oct 14, 2023

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2310.10688
• PDF: https://arxiv.org/pdf/2310.10688
• Github: https://github.com/google-research/timesfm

🔹 Models citing this paper:
https://huggingface.co/google/timesfm-1.0-200m
https://huggingface.co/google/timesfm-2.0-500m-pytorch
https://huggingface.co/google/timesfm-2.5-200m-pytorch

Spaces citing this paper:
https://huggingface.co/spaces/autogluon/fev-bench
https://huggingface.co/spaces/JayLacoma/Trader_Technical_Indicators
https://huggingface.co/spaces/pavel321/huggingface-cli-completion

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research
SAM 3D Body: Robust Full-Body Human Mesh Recovery

📝 Summary:
A promptable 3D human mesh recovery model using a novel parametric representation and encoder-decoder architecture achieves state-of-the-art performance with strong generalization across diverse condi...

🔹 Publication Date: Published on Feb 17

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2602.15989
• PDF: https://arxiv.org/pdf/2602.15989
• Project Page: https://ai.meta.com/research/sam3d/
• Github: https://github.com/facebookresearch/sam-3d-body

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research