ML Research Hub
32.6K subscribers
5.77K photos
367 videos
24 files
6.24K links
Advancing research in Machine Learning – practical insights, tools, and techniques for researchers.

Admin: @HusseinSheikho || @Hussein_Sheikho
Download Telegram
MM-Zero: Self-Evolving Multi-Model Vision Language Models From Zero Data

📝 Summary:
MM-Zero introduces a zero-data self-evolving framework for Vision Language Models using a multi-role system Proposer Coder Solver. It generates visual content and performs reasoning, trained with Group Relative Policy Optimization. This improves VLM reasoning performance and offers a scalable sel...

🔹 Publication Date: Published on Mar 10

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.09206
• PDF: https://arxiv.org/pdf/2603.09206
• Github: https://github.com/zli12321/MM-Zero

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research
MiniAppBench: Evaluating the Shift from Text to Interactive HTML Responses in LLM-Powered Assistants

📝 Summary:
MiniAppBench introduces the first comprehensive benchmark for evaluating principle-driven, interactive application generation, addressing the gap in existing benchmarks that focus on static correctnes...

🔹 Publication Date: Published on Mar 10

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.09652
• PDF: https://arxiv.org/pdf/2603.09652
• Project Page: https://miniappbench.github.io/
• Github: https://github.com/MiniAppBench/miniappbench

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research
Fish Audio S2 Technical Report

📝 Summary:
Fish Audio S2 is an open-source text-to-speech system with multi-speaker capabilities, multi-turn generation, and instruction-following control through natural-language descriptions, utilizing a multi...

🔹 Publication Date: Published on Mar 9

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.08823
• PDF: https://arxiv.org/pdf/2603.08823
• Project Page: https://fish.audio/
• Github: https://github.com/fishaudio/fish-speech

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research
VLM-SubtleBench: How Far Are VLMs from Human-Level Subtle Comparative Reasoning?

📝 Summary:
VLM-SubtleBench is introduced as a benchmark for evaluating vision-language models on subtle comparative reasoning across diverse domains, revealing significant gaps between model and human performanc...

🔹 Publication Date: Published on Mar 9

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.07888
• PDF: https://arxiv.org/pdf/2603.07888
• Github: https://github.com/krafton-ai/VLM-SubtleBench

Datasets citing this paper:
https://huggingface.co/datasets/KRAFTON/VLM-SubtleBench

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research
Towards a Neural Debugger for Python

📝 Summary:
Neural debuggers are language models that emulate traditional debuggers by supporting interactive control operations like stepping and breakpoint setting, enabling both forward and inverse execution p...

🔹 Publication Date: Published on Mar 10

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.09951
• PDF: https://arxiv.org/pdf/2603.09951

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research
A Text-Native Interface for Generative Video Authoring

📝 Summary:
Everyone can write their stories in freeform text format -- it's something we all learn in school. Yet storytelling via video requires one to learn specialized and complicated tools. In this paper, we...

🔹 Publication Date: Published on Mar 10

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.09072
• PDF: https://arxiv.org/pdf/2603.09072

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research
Stepping VLMs onto the Court: Benchmarking Spatial Intelligence in Sports

📝 Summary:
CourtSI is a large-scale spatial intelligence dataset for sports scenarios that enables evaluation and improvement of vision-language models' understanding of human motion and object interactions. AI-...

🔹 Publication Date: Published on Mar 10

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.09896
• PDF: https://arxiv.org/pdf/2603.09896
• Project Page: https://visionary-laboratory.github.io/CourtSI/
• Github: https://github.com/Visionary-Laboratory/CourtSI

Datasets citing this paper:
https://huggingface.co/datasets/Charlie019/CourtSI-1M
https://huggingface.co/datasets/Charlie019/CourtSI-Bench
https://huggingface.co/datasets/Charlie019/CourtSI-Ext

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research
InternVL-U: Democratizing Unified Multimodal Models for Understanding, Reasoning, Generation and Editing

📝 Summary:
InternVL-U is a 4-billion parameter unified multimodal model that combines advanced visual generation with robust semantic understanding through specialized modular design and reasoning-centric data s...

🔹 Publication Date: Published on Mar 10

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.09877
• PDF: https://arxiv.org/pdf/2603.09877
• Github: https://github.com/OpenGVLab/InternVL-U

🔹 Models citing this paper:
https://huggingface.co/InternVL-U/InternVL-U

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research
Media is too big
VIEW IN TELEGRAM
Streaming Autoregressive Video Generation via Diagonal Distillation

📝 Summary:
Diagonal Distillation improves video generation speed and quality by leveraging temporal context and asymmetric denoising steps while addressing error accumulation and motion coherence issues in diffu...

🔹 Publication Date: Published on Mar 10

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.09488
• PDF: https://arxiv.org/pdf/2603.09488
• Project Page: https://spherelab.ai/diagdistill
• Github: https://github.com/Sphere-AI-Lab/diagdistill

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research
Geometry-Guided Reinforcement Learning for Multi-view Consistent 3D Scene Editing

📝 Summary:
RL3DEdit uses reinforcement learning with rewards from a 3D foundation model to achieve multi-view consistent 3D editing from 2D editing priors. AI-generated summary Leveraging the priors of 2D diffus...

🔹 Publication Date: Published on Mar 3

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.03143
• PDF: https://arxiv.org/pdf/2603.03143
• Project Page: https://amap-ml.github.io/RL3DEdit/
• Github: https://github.com/AMAP-ML/RL3DEdit

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research
Decoupling Reasoning and Confidence: Resurrecting Calibration in Reinforcement Learning from Verifiable Rewards

📝 Summary:
DCPO framework decouples reasoning and calibration objectives in LLMs to address calibration degeneration while maintaining high accuracy. AI-generated summary Reinforcement Learning from Verifiable R...

🔹 Publication Date: Published on Mar 10

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.09117
• PDF: https://arxiv.org/pdf/2603.09117
• Github: https://github.com/icip-cas/DCPO

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research
Thinking to Recall: How Reasoning Unlocks Parametric Knowledge in LLMs

📝 Summary:
Reasoning unexpectedly enhances LLM recall of simple facts through a computational buffer and factual priming. While priming risks hallucination, selecting accurate reasoning paths can improve final answer precision.

🔹 Publication Date: Published on Mar 10

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.09906
• PDF: https://arxiv.org/pdf/2603.09906

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#LLMs #AI #Reasoning #NLP #KnowledgeRetrieval
ConFu: Contemplate the Future for Better Speculative Sampling

📝 Summary:
ConFu is a novel speculative decoding framework that enhances draft models by enabling future-oriented generation prediction. It uses contemplate tokens and soft prompts to anticipate future steps, reducing error accumulation. This significantly improves token acceptance rates and inference speed...

🔹 Publication Date: Published on Mar 9

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.08899
• PDF: https://arxiv.org/pdf/2603.08899

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#SpeculativeDecoding #LLMs #GenerativeAI #AIResearch #InferenceSpeed
BrandFusion: A Multi-Agent Framework for Seamless Brand Integration in Text-to-Video Generation

📝 Summary:
BrandFusion is a multi-agent framework for seamlessly integrating advertiser brands into text-to-video. It ensures semantic fidelity, brand recognizability, and natural integration. Experiments show it outperforms baselines, enabling T2V monetization.

🔹 Publication Date: Published on Mar 3

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.02816
• PDF: https://arxiv.org/pdf/2603.02816
• Project Page: https://zihao-ai.github.io/brandfusion/

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#TextToVideo #BrandIntegration #GenerativeAI #MultiAgentSystems #AdTech
🎁 23 Years of SPOTO – Claim Your Free IT Certs Prep Kit!

🔥Whether you're preparing for #Python, #AI, #Cisco, #PMI, #Fortinet, #AWS, #Azure, #Excel, #comptia, #ITIL, #cloud or any other in-demand certification – SPOTO has got you covered!

Free Resources :
・Free Python, Excel, Cyber Security, Cisco, SQL, ITIL, PMP, AWS courses: https://bit.ly/4lk4m3c
・IT Certs E-book: https://bit.ly/4bdZOqt
・IT Exams Skill Test: https://bit.ly/4sDvi0b
・Free AI material and support tools: https://bit.ly/46TpsQ8
・Free Cloud Study Guide: https://bit.ly/4lk3dIS

🎁 Join SPOTO 23rd anniversary Lucky Draw:
📱 iPhone 17
🛒free order
🛒 Amazon Gift Card $50/$100
📘 AI/CCNA/PMP Course Training + Study Material + eBook
Enter the Draw 👉: https://bit.ly/3NwkceD

👉 Become Part of Our IT Learning Circle! resources and support:
https://chat.whatsapp.com/Cnc5M5353oSBo3savBl397

💬 Want exam help? Chat with an admin now!
wa.link/rozuuw

Last Chance – Get It Before It’s Gone!
🔥2👍1👏1
Are Audio-Language Models Listening? Audio-Specialist Heads for Adaptive Audio Steering

📝 Summary:
Large audio-language models can under-utilize audio. This work identifies audio-specialist attention heads that provide a listening signal. An inference-time intervention amplifies audio influence, improving LALM accuracy by up to 8% without parameter updates.

🔹 Publication Date: Published on Mar 6

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.06854
• PDF: https://arxiv.org/pdf/2603.06854

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#AudioLanguageModels #DeepLearning #AttentionMechanisms #AIResearch #MachineLearning
This media is not supported in your browser
VIEW IN TELEGRAM
Reward Prediction with Factorized World States

📝 Summary:
StateFactory transforms observations into hierarchical object-attribute structures using language models. This enables superior zero-shot reward prediction across domains by measuring semantic similarity, significantly improving agent planning performance.

🔹 Publication Date: Published on Mar 10

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.09400
• PDF: https://arxiv.org/pdf/2603.09400
• Project Page: https://statefactory.github.io/
• Github: https://github.com/yijunshens/StateFactory

Datasets citing this paper:
https://huggingface.co/datasets/YijunShen/RewardPrediction

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#RewardPrediction #AI #LanguageModels #MachineLearning #AgentPlanning
Do What I Say: A Spoken Prompt Dataset for Instruction-Following

📝 Summary:
DoWhatISay is a new multilingual dataset of human-recorded spoken and written prompts for evaluating Speech Large Language Models. It reveals text prompts consistently outperform spoken prompts, except in speech-output tasks. This highlights the need for speech-based SLLM evaluation.

🔹 Publication Date: Published on Mar 10

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.09881
• PDF: https://arxiv.org/pdf/2603.09881
• Project Page: https://huggingface.co/collections/meetween/meetweens-research-papers
• Github: https://github.com/MaikeZuefle/DOWIS

Datasets citing this paper:
https://huggingface.co/datasets/maikezu/dowis

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#SLLM #SpeechAI #LLM #PromptEngineering #Dataset
Compiler-First State Space Duality and Portable O(1) Autoregressive Caching for Inference

📝 Summary:
Mamba-2's state space model is implemented using XLA-optimized primitives, eliminating custom kernels. This enables efficient cross-platform deployment on CPU, GPU, and TPU, realizing O1 autoregressive caching with high performance.

🔹 Publication Date: Published on Mar 10

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.09555
• PDF: https://arxiv.org/pdf/2603.09555
• Github: https://github.com/CosmoNaught/mamba2-jax

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#Mamba2 #StateSpaceModels #DeepLearning #MLInference #PerformanceOptimization
ReflexiCoder: Teaching Large Language Models to Self-Reflect on Generated Code and Self-Correct It via Reinforcement Learning

📝 Summary:
ReflexiCoder uses reinforcement learning to teach large language models autonomous code reflection and self-correction. It internalizes the debugging process into the model, achieving state-of-the-art performance on coding benchmarks, rivaling proprietary models, and reducing inference compute by...

🔹 Publication Date: Published on Mar 6

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.05863
• PDF: https://arxiv.org/pdf/2603.05863
• Github: https://github.com/juyongjiang/ReflexiCoder

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#LLM #ReinforcementLearning #CodeGeneration #AI #DeepLearning