ML Research Hub
32.3K subscribers
6.68K photos
464 videos
24 files
7.27K links
Advancing research in Machine Learning – practical insights, tools, and techniques for researchers.

Admin: @HusseinSheikho || @Hussein_Sheikho
Download Telegram
On the Reliability of Computer Use Agents

📝 Summary:
Computer-use agents exhibit unreliable performance due to execution stochasticity, task specification ambiguity, and behavioral variability, necessitating repeated evaluation and stable strategies for...

🔹 Publication Date: Published on Apr 20

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2604.17849
• PDF: https://arxiv.org/pdf/2604.17849
• Github: https://github.com/simular-ai/cua_reliability

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research
The Illusion of Certainty: Decoupling Capability and Calibration in On-Policy Distillation

📝 Summary:
On-policy distillation suffers from miscalibration due to information mismatch between training and deployment contexts, which is addressed through a calibration-aware framework that improves both per...

🔹 Publication Date: Published on Apr 18

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2604.16830
• PDF: https://arxiv.org/pdf/2604.16830
• Project Page: https://github.com/SalesforceAIResearch/CaOPD
• Github: https://github.com/SalesforceAIResearch/CaOPD

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research
Forge-UGC: FX optimization and register-graph engine for universal graph compiler

📝 Summary:
Forge-UGC is a four-phase compiler for efficient transformer deployment on heterogeneous hardware, offering faster compilation, reduced inference latency, and lower energy consumption compared to exis...

🔹 Publication Date: Published on Apr 14

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2604.16498
• PDF: https://arxiv.org/pdf/2604.16498

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research
Protecting Language Models Against Unauthorized Distillation through Trace Rewriting

📝 Summary:
Techniques for modifying teacher-generated reasoning traces to prevent unauthorized knowledge distillation while maintaining answer correctness and enabling detectable watermarks are presented. AI-gen...

🔹 Publication Date: Published on Apr 16

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2602.15143
• PDF: https://arxiv.org/pdf/2602.15143
• Github: https://github.com/xhOwenMa/trace-rewriting

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research
Symbolic Guardrails for Domain-Specific Agents: Stronger Safety and Security Guarantees Without Sacrificing Utility

📝 Summary:
Symbolic guardrails provide strong safety and security guarantees for AI agents in high-stakes environments. A study found these guardrails can enforce 74% of specified policy requirements, improving safety without sacrificing utility. This makes them a practical solution for domain-specific agents.

🔹 Publication Date: Published on Apr 16

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2604.15579
• PDF: https://arxiv.org/pdf/2604.15579
• Github: https://github.com/hyn0027/agent-symbolic-guardrails

Datasets citing this paper:
https://huggingface.co/datasets/hyn0027D/agent-symbolic-guardrails

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research
When Background Matters: Breaking Medical Vision Language Models by Transferable Attack

📝 Summary:
MedFocusLeak enables transferable black-box attacks on vision-language models for medical imaging by injecting imperceptible perturbations that redirect model attention, demonstrating significant vuln...

🔹 Publication Date: Published on Apr 19

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2604.17318
• PDF: https://arxiv.org/pdf/2604.17318
• Project Page: https://akashghosh.github.io/MedFocusLeakACL/
• Github: https://github.com/AkashGhosh/When-Background-Matters-Breaking-Medical-Vision-Language-Models-by-Transferable-Attack

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research
This media is not supported in your browser
VIEW IN TELEGRAM
MARCO: Navigating the Unseen Space of Semantic Correspondence

📝 Summary:
MARCO is a compact, fast model for semantic correspondence that excels at generalizing to unseen keypoints. Its coarse-to-fine objective and self-distillation framework improve fine-grained localization and overall accuracy.

🔹 Publication Date: Published on Apr 20

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2604.18267
• PDF: https://arxiv.org/pdf/2604.18267
• Project Page: https://visinf.github.io/MARCO
• Github: https://github.com/visinf/MARCO

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research
River-LLM: Large Language Model Seamless Exit Based on KV Share

📝 Summary:
River-LLM enables efficient token-level early exit in LLMs by introducing a KV-Shared Exit River. This mechanism naturally generates and preserves missing historical states, overcoming the KV Cache Absence problem. It achieves 1.71 to 2.16 times practical speedup while maintaining high generation...

🔹 Publication Date: Published on Apr 20

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2604.18396
• PDF: https://arxiv.org/pdf/2604.18396

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research
Terminal Wrench: A Dataset of 331 Reward-Hackable Environments and 3,632 Exploit Trajectories

📝 Summary:
A dataset of 331 terminal-agent environments with 3,632 reward-hacking trajectories and 2,352 legitimate baselines across four AI models is released to study adversarial exploits in system administrat...

🔹 Publication Date: Published on Apr 19

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2604.17596
• PDF: https://arxiv.org/pdf/2604.17596
• Project Page: https://github.com/few-sh/terminal-wrench
• Github: https://github.com/few-sh/terminal-wrench

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research
KWBench: Measuring Unprompted Problem Recognition in Knowledge Work

📝 Summary:
KWBench is a new benchmark for evaluating LLMs ability to recognize underlying game-theoretic structures in professional scenarios without prompts. It tests if models can identify the problem type from raw inputs alone. Current LLMs perform poorly, failing to recognize problems even if they can a...

🔹 Publication Date: Published on Apr 17

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2604.15760
• PDF: https://arxiv.org/pdf/2604.15760
• Project Page: https://kwbench.github.io/
• Github: https://github.com/ankitmaloo/fasteval

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research