ML Research Hub
32.5K subscribers
6K photos
385 videos
24 files
6.49K links
Advancing research in Machine Learning – practical insights, tools, and techniques for researchers.

Admin: @HusseinSheikho || @Hussein_Sheikho
Download Telegram
Understanding Reasoning in LLMs through Strategic Information Allocation under Uncertainty

📝 Summary:
LLM reasoning involves procedural information and epistemic verbalization, which is externalized uncertainty. This verbalization drives continued information acquisition and is crucial for strong reasoning performance.

🔹 Publication Date: Published on Mar 16

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.15500
• PDF: https://arxiv.org/pdf/2603.15500
• Github: https://github.com/beanie00/strategic-information-allocation-llm-reasoning

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#LLM #Reasoning #AI #MachineLearning #Uncertainty
Supervised Fine-Tuning versus Reinforcement Learning: A Study of Post-Training Methods for Large Language Models

📝 Summary:
This study unifies Supervised Fine-Tuning SFT and Reinforcement Learning RL for post-training Large Language Models. It reviews both techniques, their interplay, and emerging hybrid approaches. The paper identifies trends from recent studies and clarifies when each method is most effective.

🔹 Publication Date: Published on Mar 14

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.13985
• PDF: https://arxiv.org/pdf/2603.13985

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#LLM #SupervisedFineTuning #ReinforcementLearning #AI #MachineLearning
SNCE: Geometry-Aware Supervision for Scalable Discrete Image Generation

📝 Summary:
SNCE is a novel training objective for large-codebook discrete image generators. It supervises models with a soft categorical distribution over neighboring tokens, based on embedding proximity, instead of hard one-hot targets. This approach significantly improves convergence speed and overall gen...

🔹 Publication Date: Published on Mar 16

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.15150
• PDF: https://arxiv.org/pdf/2603.15150

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#ImageGeneration #DeepLearning #ComputerVision #GeometryAware #AIResearch
Mixture-of-Depths Attention

📝 Summary:
Mixture-of-Depths Attention MoDA addresses signal degradation in deep LLMs by allowing attention heads to access KV pairs from current and preceding layers. MoDA improves perplexity by 0.2 and downstream task performance by 2.11% with low overhead. It is a promising primitive for depth scaling.

🔹 Publication Date: Published on Mar 16

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.15619
• PDF: https://arxiv.org/pdf/2603.15619
• Project Page: https://github.com/hustvl/MoDA
• Github: https://github.com/hustvl/MoDA

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#LLM #AttentionMechanisms #DeepLearning #AIResearch #NLP
Training-free Detection of Generated Videos via Spatial-Temporal Likelihoods

📝 Summary:
STALL is a training-free, model-agnostic detector for generated videos. It jointly models spatial and temporal evidence from real-data statistics within a probabilistic framework. STALL consistently outperforms prior image and video-based baselines, improving reliable detection.

🔹 Publication Date: Published on Mar 16

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.15026
• PDF: https://arxiv.org/pdf/2603.15026
• Project Page: https://omerbenhayun.github.io/stall-video/
• Github: https://github.com/OmerBenHayun/stall-video

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#Deepfakes #VideoDetection #ComputerVision #AI #DigitalForensics
GlyphPrinter: Region-Grouped Direct Preference Optimization for Glyph-Accurate Visual Text Rendering

📝 Summary:
GlyphPrinter improves visual text rendering by addressing glyph accuracy. It introduces Region-Grouped DPO R-GDPO with region-level preferences from the GlyphCorrector dataset, significantly enhancing precision. This outperforms existing methods.

🔹 Publication Date: Published on Mar 16

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.15616
• PDF: https://arxiv.org/pdf/2603.15616
• Project Page: https://henghuiding.com/GlyphPrinter/
• Github: https://github.com/FudanCVL/GlyphPrinter

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#GlyphRendering #DeepLearning #ComputerVision #AIResearch #TextRendering
Paper2Poster: Towards Multimodal Poster Automation from Scientific Papers

📝 Summary:
A benchmark and metric suite for poster generation evaluates visual quality, coherence, and content accuracy, leading to a multi-agent pipeline that outperforms existing models with reduced computatio...

🔹 Publication Date: Published on May 27, 2025

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2505.21497
• PDF: https://arxiv.org/pdf/2505.21497
• Project Page: https://paper2poster.github.io/
• Github: https://paper2poster.github.io/

Datasets citing this paper:
https://huggingface.co/datasets/Paper2Poster/Paper2Poster

Spaces citing this paper:
https://huggingface.co/spaces/KevinQHLin/Paper2Poster
https://huggingface.co/spaces/camel-ai/Paper2Poster
https://huggingface.co/spaces/wangrongsheng/Paper2Poster

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research
Learning Latent Proxies for Controllable Single-Image Relighting

📝 Summary:
Single-image relighting is challenging due to unobserved geometry and materials. LightCtrl introduces a diffusion model guided by sparse, physically meaningful cues from a latent proxy encoder and lighting-aware masks. This enables photometrically faithful relighting with accurate control, outper...

🔹 Publication Date: Published on Mar 16

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.15555
• PDF: https://arxiv.org/pdf/2603.15555

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#ImageRelighting #DiffusionModels #ComputerVision #DeepLearning #AIResearch
Efficient Document Parsing via Parallel Token Prediction

📝 Summary:
PTP is a novel method to accelerate document parsing by overcoming slow autoregressive decoding in VLMs. It enables parallel token generation using learnable tokens, significantly boosting speed 1.6x-2.2x while reducing hallucinations and showing strong generalization.

🔹 Publication Date: Published on Mar 16

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.15206
• PDF: https://arxiv.org/pdf/2603.15206

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#DocumentParsing #VLMs #ParallelProcessing #AIEfficiency #NLP
Riemannian Motion Generation: A Unified Framework for Human Motion Representation and Generation via Riemannian Flow Matching

📝 Summary:
RMG is a new framework representing human motion on a product manifold and learning dynamics via Riemannian flow matching. This geometry-aware approach achieves state-of-the-art results on HumanML3D and MotionMillion, showing that modeling non-Euclidean motion geometry leads to more stable and ef...

🔹 Publication Date: Published on Mar 16

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.15016
• PDF: https://arxiv.org/pdf/2603.15016
• Project Page: https://frank-miao.github.io/RMG-Project-Page

Spaces citing this paper:
https://huggingface.co/spaces/Frank-miao/RMG

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#HumanMotionGeneration #RiemannianGeometry #MachineLearning #AIResearch #GenerativeModels
SCoCCA: Multi-modal Sparse Concept Decomposition via Canonical Correlation Analysis

📝 Summary:
Interpreting the internal reasoning of vision-language models is essential for deploying AI in safety-critical domains. Concept-based explainability provides a human-aligned lens by representing a mod...

🔹 Publication Date: Published on Mar 14

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.13884
• PDF: https://arxiv.org/pdf/2603.13884

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research
Motivation in Large Language Models

📝 Summary:
Motivation is a central driver of human behavior, shaping decisions, goals, and task performance. As large language models (LLMs) become increasingly aligned with human preferences, we ask whether the...

🔹 Publication Date: Published on Mar 15

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.14347
• PDF: https://arxiv.org/pdf/2603.14347
• Github: https://github.com/omer6nahum/motivation_llms

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research
Effective Distillation to Hybrid xLSTM Architectures

📝 Summary:
Distilling quadratic LLMs to sub-quadratic models typically loses performance. We introduce an xLSTM distillation pipeline with an expert merging stage, enabling students to recover or exceed teacher performance for efficient LLMs.

🔹 Publication Date: Published on Mar 16

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.15590
• PDF: https://arxiv.org/pdf/2603.15590

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#LLMs #xLSTM #ModelDistillation #AIResearch #EfficientAI
Anatomy of a Lie: A Multi-Stage Diagnostic Framework for Tracing Hallucinations in Vision-Language Models

📝 Summary:
Vision-Language Models (VLMs) frequently "hallucinate" - generate plausible yet factually incorrect statements - posing a critical barrier to their trustworthy deployment. In this work, we propose a n...

🔹 Publication Date: Published on Mar 16

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.15557
• PDF: https://arxiv.org/pdf/2603.15557
• Github: https://github.com/Lexiang-Xiong/CAD

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#VLM #AIHallucinations #TrustworthyAI #ExplainableAI #AIResearch
EnterpriseOps-Gym: Environments and Evaluations for Stateful Agentic Planning and Tool Use in Enterprise Settings

📝 Summary:
EnterpriseOps-Gym is a new benchmark for evaluating LLM agents in realistic enterprise settings, featuring a complex sandbox and curated tasks. It reveals current models struggle with strategic planning and task refusal, achieving low success rates, indicating they are not ready for autonomous de...

🔹 Publication Date: Published on Mar 13

🔹 Paper Links:
• arXiv Page: https://arxiv.org/pdf/2603.13594
• PDF: https://arxiv.org/pdf/2603.13594
• Project Page: https://enterpriseops-gym.github.io/
• Github: https://enterpriseops-gym.github.io

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#LLMAgents #EnterpriseAI #AIResearch #Benchmarking #ToolUse
When Does Sparsity Mitigate the Curse of Depth in LLMs

📝 Summary:
Recent work has demonstrated the curse of depth in large language models (LLMs), where later layers contribute less to learning and representation than earlier layers. Such under-utilization is linked...

🔹 Publication Date: Published on Mar 16

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.15389
• PDF: https://arxiv.org/pdf/2603.15389
• Project Page: https://pumpkin-co.github.io/SparsityAndCoD/
• Github: https://github.com/pUmpKin-Co/SparsityAndCoD

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research
RS-WorldModel: a Unified Model for Remote Sensing Understanding and Future Sense Forecasting

📝 Summary:
Remote sensing world models aim to both explain observed changes and forecast plausible futures, two tasks that share spatiotemporal priors. Existing methods, however, typically address them separatel...

🔹 Publication Date: Published on Mar 16

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.14941
• PDF: https://arxiv.org/pdf/2603.14941

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research
sebis at ArchEHR-QA 2026: How Much Can You Do Locally? Evaluating Grounded EHR QA on a Single Notebook

📝 Summary:
Clinical question answering over electronic health records (EHRs) can help clinicians and patients access relevant medical information more efficiently. However, many recent approaches rely on large c...

🔹 Publication Date: Published on Mar 14

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.13962
• PDF: https://arxiv.org/pdf/2603.13962
• Github: https://github.com/ibrahimey/ArchEHR-QA-2026

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research
Make it SING: Analyzing Semantic Invariants in Classifiers

📝 Summary:
All classifiers, including state-of-the-art vision models, possess invariants, partially rooted in the geometry of their linear mappings. These invariants, which reside in the null-space of the classi...

🔹 Publication Date: Published on Mar 15

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.14610
• PDF: https://arxiv.org/pdf/2603.14610
• Project Page: https://harel314.github.io/SING-analyzing-semantic-invariants-classifiers/
• Github: https://github.com/harel314/SING-analyzing-semantic-invariants-classifiers

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research
Towards Generalizable Robotic Manipulation in Dynamic Environments

📝 Summary:
Vision-Language-Action (VLA) models excel in static manipulation but struggle in dynamic environments with moving targets. This performance gap primarily stems from a scarcity of dynamic manipulation ...

🔹 Publication Date: Published on Mar 16

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.15620
• PDF: https://arxiv.org/pdf/2603.15620
• Project Page: https://h-embodvis.github.io/DOMINO/
• Github: https://github.com/H-EmbodVis/DOMINO

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research
OxyGen: Unified KV Cache Management for Vision-Language-Action Models under Multi-Task Parallelism

📝 Summary:
OxyGen unifies KV cache management for multi-task Vision-Language-Action models, addressing inefficiency from isolated caches. By treating KV cache as a shared resource, it enables cross-task sharing and continuous batching. This achieves up to 3.7 times speedup, providing high language throughpu...

🔹 Publication Date: Published on Mar 15

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.14371
• PDF: https://arxiv.org/pdf/2603.14371
• Github: https://github.com/air-embodied-brain/OxyGen

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research