ML Research Hub
32.3K subscribers
6.78K photos
479 videos
24 files
7.4K links
Advancing research in Machine Learning – practical insights, tools, and techniques for researchers.

Admin: @HusseinSheikho || @Hussein_Sheikho
Download Telegram
This media is not supported in your browser
VIEW IN TELEGRAM
When Numbers Speak: Aligning Textual Numerals and Visual Instances in Text-to-Video Diffusion Models

📝 Summary:
NUMINA enhances text-to-video diffusion models' numerical accuracy through a training-free framework that identifies layout inconsistencies and guides regeneration via attention modulation. AI-generat...

🔹 Publication Date: Published on Apr 9

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2604.08546
• PDF: https://arxiv.org/pdf/2604.08546
• Project Page: https://h-embodvis.github.io/NUMINA/
• Github: https://github.com/H-EmbodVis/NUMINA

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research
HY-Embodied-0.5: Embodied Foundation Models for Real-World Agents

📝 Summary:
HY-Embodied-0.5 is a foundation model family for embodied agents featuring Mixture-of-Transformers architecture and iterative post-training for enhanced visual perception and reasoning capabilities. A...

🔹 Publication Date: Published on Apr 8

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2604.07430
• PDF: https://arxiv.org/pdf/2604.07430
• Github: https://github.com/Tencent-Hunyuan/HY-Embodied

🔹 Models citing this paper:
https://huggingface.co/tencent/HY-Embodied-0.5

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research
FIT: A Large-Scale Dataset for Fit-Aware Virtual Try-On

📝 Summary:
A large-scale virtual try-on dataset called FIT is introduced that includes precise body and garment measurements to address garment fit accuracy, using synthetic 3D garment generation, physics simula...

🔹 Publication Date: Published on Apr 9

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2604.08526
• PDF: https://arxiv.org/pdf/2604.08526
• Project Page: https://johannakarras.github.io/FIT/

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research
MolmoWeb: Open Visual Web Agent and Open Data for the Open Web

📝 Summary:
Open-source web agents leveraging diverse mixed datasets achieve state-of-the-art performance on browser-based tasks while operating without access to HTML or accessibility tree information. AI-genera...

🔹 Publication Date: Published on Apr 9

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2604.08516
• PDF: https://arxiv.org/pdf/2604.08516
• Project Page: https://allenai.org/blog/molmoweb

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research
Act Wisely: Cultivating Meta-Cognitive Tool Use in Agentic Multimodal Models

📝 Summary:
Agents with meta-cognitive deficits struggle with tool usage decisions, leading to inefficiencies; a new framework called HDPO addresses this through decoupled optimization channels for accuracy and e...

🔹 Publication Date: Published on Apr 9

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2604.08545
• PDF: https://arxiv.org/pdf/2604.08545
• Project Page: https://Accio-Lab.github.io/Metis
• Github: https://github.com/Accio-Lab/Metis

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research
Beyond Stochastic Exploration: What Makes Training Data Valuable for Agentic Search

📝 Summary:
A novel hierarchical experience framework improves reinforcement learning-based search agents by transforming raw reasoning trajectories into structured knowledge, enhancing both performance and train...

🔹 Publication Date: Published on Apr 9

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2604.08124
• PDF: https://arxiv.org/pdf/2604.08124

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research
SkillClaw: Let Skills Evolve Collectively with Agentic Evolver

📝 Summary:
SkillClaw enables collective skill evolution in multi-user LLM agent systems by aggregating user interactions to autonomously update and improve reusable skills across the ecosystem. AI-generated summ...

🔹 Publication Date: Published on Apr 9

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2604.08377
• PDF: https://arxiv.org/pdf/2604.08377

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research
RewardFlow: Generate Images by Optimizing What You Reward

📝 Summary:
RewardFlow enables pretrained diffusion and flow-matching models to be guided during inference through multi-reward Langevin dynamics without requiring inversion, achieving superior performance in ima...

🔹 Publication Date: Published on Apr 9

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2604.08536
• PDF: https://arxiv.org/pdf/2604.08536
• Project Page: https://plan-lab.github.io/projects/rewardflow

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research
Phantom: Physics-Infused Video Generation via Joint Modeling of Visual and Latent Physical Dynamics

📝 Summary:
Phantom is a physics-infused video generation model that jointly models visual content and latent physical dynamics to produce videos that are both visually realistic and physically consistent. AI-gen...

🔹 Publication Date: Published on Apr 9

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2604.08503
• PDF: https://arxiv.org/pdf/2604.08503
• Project Page: https://plan-lab.github.io/projects/phantom

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research
PokeGym: A Visually-Driven Long-Horizon Benchmark for Vision-Language Models

📝 Summary:
Vision-Language Models face limitations in 3D embodied environments due to insufficient physical reasoning capabilities, as demonstrated by the PokeGym benchmark that reveals deadlock recovery as the ...

🔹 Publication Date: Published on Apr 9

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2604.08340
• PDF: https://arxiv.org/pdf/2604.08340

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research
OpenVLThinkerV2: A Generalist Multimodal Reasoning Model for Multi-domain Visual Tasks

📝 Summary:
Gaussian GRPO addresses challenges in multimodal model training by using distributional matching to ensure gradient equity and stable reinforcement learning, enabling improved perception-reasoning bal...

🔹 Publication Date: Published on Apr 9

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2604.08539
• PDF: https://arxiv.org/pdf/2604.08539
• Project Page: https://gordonhu608.github.io/openvlthinkerv2.github.io/

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research
GameWorld: Towards Standardized and Verifiable Evaluation of Multimodal Game Agents

📝 Summary:
GameWorld presents a standardized benchmark for evaluating multimodal large language model agents in video games, featuring diverse games and verified metrics for comprehensive assessment. AI-generate...

🔹 Publication Date: Published on Apr 8

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2604.07429
• PDF: https://arxiv.org/pdf/2604.07429
• Project Page: https://gameworld-bench.github.io/

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research
MegaStyle: Constructing Diverse and Scalable Style Dataset via Consistent Text-to-Image Style Mapping

📝 Summary:
MegaStyle presents a scalable data curation pipeline for creating high-quality, style-consistent datasets using large generative models and proposes style-supervised contrastive learning for effective...

🔹 Publication Date: Published on Apr 9

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2604.08364
• PDF: https://arxiv.org/pdf/2604.08364
• Project Page: https://jeoyal.github.io/MegaStyle/

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research
OpenSpatial: A Principled Data Engine for Empowering Spatial Intelligence

📝 Summary:
OpenSpatial presents an open-source data engine for spatial reasoning tasks using 3D bounding boxes, creating a large-scale dataset and achieving state-of-the-art performance in spatial perception ben...

🔹 Publication Date: Published on Apr 8

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2604.07296
• PDF: https://arxiv.org/pdf/2604.07296
• Github: https://github.com/VINHYU/OpenSpatial

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research
Lighting-grounded Video Generation with Renderer-based Agent Reasoning

📝 Summary:
LiVER presents a diffusion-based framework for scene-controllable video generation that disentangles 3D scene properties through explicit conditioning and automated user instruction translation. AI-ge...

🔹 Publication Date: Published on Apr 9

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2604.07966
• PDF: https://arxiv.org/pdf/2604.07966

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research
Media is too big
VIEW IN TELEGRAM
Automating Database-Native Function Code Synthesis with LLMs

📝 Summary:
D a t a b a s e s y s t e m s i n c o r p o r a t e a n e v e r - g r o w i n g n u m b e r o f f u n c t i o n s i n t h e i r k e r n e l s ( a . k . a . , d a t a b a s e n a t i v e f u n c t i o ...

🔹 Publication Date: Published on Apr 2

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2604.06231
• PDF: https://arxiv.org/pdf/2604.06231
• Project Page: https://code4db.github.io/hi-opencook/
• Github: https://github.com/weAIDB/OpenCook

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research
SIM1: Physics-Aligned Simulator as Zero-Shot Data Scaler in Deformable Worlds

📝 Summary:
A physics-aligned simulation framework enables effective robotic manipulation of deformable objects by creating metric-consistent synthetic data that matches real-world performance. AI-generated summa...

🔹 Publication Date: Published on Apr 9

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2604.08544
• PDF: https://arxiv.org/pdf/2604.08544
• Project Page: https://internrobotics.github.io/sim1.github.io/
• Github: https://github.com/InternRobotics/SIM1

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research
Structured Distillation of Web Agent Capabilities Enables Generalization

📝 Summary:
Structured synthetic trajectory generation using a frontier LLM as teacher enables open-weight web agents with superior performance and cross-environment capabilities. AI-generated summary Frontier LL...

🔹 Publication Date: Published on Apr 9

🔹 Paper Links:
• arXiv Page: https://huggingface.co/collections/McGill-NLP/a3-agent-as-annotators
• PDF: https://arxiv.org/pdf/2604.07776
• Project Page: https://agent-as-annotators.github.io/
• Github: https://github.com/McGill-NLP/agent-as-annotators

🔹 Models citing this paper:
https://huggingface.co/McGill-NLP/A3-Qwen3.5-9B
https://huggingface.co/McGill-NLP/A3-Qwen3.5-4B
https://huggingface.co/McGill-NLP/A3-Qwen3.5-2B

Datasets citing this paper:
https://huggingface.co/datasets/McGill-NLP/A3-Synth

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research
Structural Graph Probing of Vision-Language Models

📝 Summary:
Vision-language models exhibit structured neural topology where correlation graphs reveal behaviorally significant patterns and influential recurrent hub neurons that drive multimodal performance. AI-...

🔹 Publication Date: Published on Mar 28

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.27070
• PDF: https://arxiv.org/pdf/2603.27070
• Github: https://github.com/he-h/vlm-graphprobing

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research
ClawBench: Can AI Agents Complete Everyday Online Tasks?

📝 Summary:
ClawBench presents a framework of 153 real-world online tasks on live platforms to evaluate AI agents. These complex multi-step tasks require capabilities like document processing and form filling. Current frontier AI models complete only a small portion, showing significant limitations for gener...

🔹 Publication Date: Published on Apr 9

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2604.08523
• PDF: https://arxiv.org/pdf/2604.08523
• Project Page: https://claw-bench.com
• Github: https://github.com/reacher-z/ClawBench

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research
Rethinking Generalization in Reasoning SFT: A Conditional Analysis on Optimization, Data, and Model Capability

📝 Summary:
Supervised finetuning and reinforcement learning exhibit conditional cross-domain generalization in reasoning tasks, influenced by optimization dynamics, data quality, and model capability, with asymm...

🔹 Publication Date: Published on Apr 8

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2604.06628
• PDF: https://arxiv.org/pdf/2604.06628
• Github: https://github.com/Nebularaid2000/rethink_sft_generalization

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research