ML Research Hub
32.5K subscribers
5.93K photos
381 videos
24 files
6.42K links
Advancing research in Machine Learning – practical insights, tools, and techniques for researchers.

Admin: @HusseinSheikho || @Hussein_Sheikho
Download Telegram
Media is too big
VIEW IN TELEGRAM
Grounding World Simulation Models in a Real-World Metropolis

📝 Summary:
Seoul World Model SWM renders video simulations of actual cities, not imagined environments. It grounds autoregressive video generation using real street-view images, overcoming data challenges. SWM generates spatially faithful, long-horizon urban videos for diverse camera paths and scenarios.

🔹 Publication Date: Published on Mar 16

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.15583
• PDF: https://seoul-world-model.github.io/SWM_paper.pdf
• Project Page: https://seoul-world-model.github.io/

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research
Panoramic Affordance Prediction

📝 Summary:
Affordance prediction serves as a critical bridge between perception and action in embodied AI. However, existing research is confined to pinhole camera models, which suffer from narrow Fields of View...

🔹 Publication Date: Published on Mar 16

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.15558
• PDF: https://arxiv.org/pdf/2603.15558
• Project Page: https://zixinzhang02.github.io/Panoramic-Affordance-Prediction/
• Github: https://zixinzhang02.github.io/Panoramic-Affordance-Prediction/

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research
MMOU: A Massive Multi-Task Omni Understanding and Reasoning Benchmark for Long and Complex Real-World Videos

📝 Summary:
Multimodal Large Language Models (MLLMs) have shown strong performance in visual and audio understanding when evaluated in isolation. However, their ability to jointly reason over omni-modal (visual, ...

🔹 Publication Date: Published on Mar 14

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.14145
• PDF: https://arxiv.org/pdf/2603.14145
• Project Page: https://huggingface.co/datasets/nvidia/MMOU

Datasets citing this paper:
https://huggingface.co/datasets/nvidia/MMOU

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research
Mind the Shift: Decoding Monetary Policy Stance from FOMC Statements with Large Language Models

📝 Summary:
Federal Open Market Committee (FOMC) statements are a major source of monetary-policy information, and even subtle changes in their wording can move global financial markets. A central task is therefo...

🔹 Publication Date: Published on Mar 15

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.14313
• PDF: https://arxiv.org/pdf/2603.14313
• Project Page: https://yixuantt.github.io/DeltaConsistent/
• Github: https://github.com/yixuantt/DeltaConsistentScoring

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research
OpenSeeker: Democratizing Frontier Search Agents by Fully Open-Sourcing Training Data

📝 Summary:
Deep search capabilities have become an indispensable competency for frontier Large Language Model (LLM) agents, yet the development of high-performance search agents remains dominated by industrial g...

🔹 Publication Date: Published on Mar 16

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.15594
• PDF: https://github.com/rui-ye/OpenSeeker/blob/main/assets/OpenSeeker.pdf
• Github: https://github.com/rui-ye/OpenSeeker

🔹 Models citing this paper:
https://huggingface.co/OpenSeeker/OpenSeeker-v1-30B-SFT

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research
WebVR: Benchmarking Multimodal LLMs for WebPage Recreation from Videos via Human-Aligned Visual Rubrics

📝 Summary:
Existing web-generation benchmarks rely on text prompts or static screenshots as input. However, videos naturally convey richer signals such as interaction flow, transition timing, and motion continui...

🔹 Publication Date: Published on Mar 11

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.13391
• PDF: https://arxiv.org/pdf/2603.13391
• Project Page: https://webvr-benchmark.github.io/
• Github: https://github.com/broalantaps/WebVR

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research
Attention Residuals

📝 Summary:
Residual connections with PreNorm are standard in modern LLMs, yet they accumulate all layer outputs with fixed unit weights. This uniform aggregation causes uncontrolled hidden-state growth with dept...

🔹 Publication Date: Published on Mar 16

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.15031
• PDF: https://arxiv.org/pdf/2603.15031

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research
HSImul3R: Physics-in-the-Loop Reconstruction of Simulation-Ready Human-Scene Interactions

📝 Summary:
HSImul3R presents a unified framework for 3D reconstruction of human-scene interactions that bridges the perception-simulation gap through physics-grounded bidirectional optimization and reinforcement...

🔹 Publication Date: Published on Mar 16

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.15612
• PDF: https://arxiv.org/pdf/2603.15612
• Project Page: https://yukangcao.github.io/HSImul3R/
• Github: https://yukangcao.github.io/HSImul3R/

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research
Code-A1: Adversarial Evolving of Code LLM and Test LLM via Reinforcement Learning

📝 Summary:
Reinforcement learning for code generation relies on verifiable rewards from unit test pass rates. Yet high-quality test suites are scarce, existing datasets offer limited coverage, and static rewards...

🔹 Publication Date: Published on Mar 16

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.15611
• PDF: https://arxiv.org/pdf/2603.15611
• Project Page: https://zju-real.github.io/Code-A1/
• Github: https://github.com/ZJU-REAL/Code-A1

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research
EvoClaw: Evaluating AI Agents on Continuous Software Evolution

📝 Summary:
With AI agents increasingly deployed as long-running systems, it becomes essential to autonomously construct and continuously evolve customized software to enable interaction within dynamic environmen...

🔹 Publication Date: Published on Mar 13

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.13428
• PDF: https://arxiv.org/pdf/2603.13428
• Project Page: https://evo-claw.com/

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research
2
MoKus: Leveraging Cross-Modal Knowledge Transfer for Knowledge-Aware Concept Customization

📝 Summary:
Knowledge-aware concept customization binds textual knowledge to visual concepts through a two-stage framework that learns visual anchors and updates textual knowledge for high-fidelity generation, su...

🔹 Publication Date: Published on Mar 13

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.12743
• PDF: https://arxiv.org/pdf/2603.12743
• Project Page: https://chenyangzhu1.github.io/MoKus/
• Github: https://github.com/HKUST-LongGroup/MoKus

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research
This media is not supported in your browser
VIEW IN TELEGRAM
TERMINATOR: Learning Optimal Exit Points for Early Stopping in Chain-of-Thought Reasoning

📝 Summary:
TERMINATOR is an early-exit method for large reasoning models to prevent overthinking during Chain-of-Thought reasoning. It learns optimal exit points by predicting the first arrival of the final answer. This reduces reasoning length by 14%-55% without performance loss.

🔹 Publication Date: Published on Mar 13

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.12529
• PDF: https://arxiv.org/pdf/2603.12529
• Project Page: https://terminator-llm.github.io/
• Github: https://terminator-llm.github.io

🔹 Models citing this paper:
https://huggingface.co/acnagle/Terminator-Qwen3-8B
https://huggingface.co/acnagle/Terminator-Qwen3-14B

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research
Understanding Reasoning in LLMs through Strategic Information Allocation under Uncertainty

📝 Summary:
LLM reasoning involves procedural information and epistemic verbalization, which is externalized uncertainty. This verbalization drives continued information acquisition and is crucial for strong reasoning performance.

🔹 Publication Date: Published on Mar 16

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.15500
• PDF: https://arxiv.org/pdf/2603.15500
• Github: https://github.com/beanie00/strategic-information-allocation-llm-reasoning

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#LLM #Reasoning #AI #MachineLearning #Uncertainty
Supervised Fine-Tuning versus Reinforcement Learning: A Study of Post-Training Methods for Large Language Models

📝 Summary:
This study unifies Supervised Fine-Tuning SFT and Reinforcement Learning RL for post-training Large Language Models. It reviews both techniques, their interplay, and emerging hybrid approaches. The paper identifies trends from recent studies and clarifies when each method is most effective.

🔹 Publication Date: Published on Mar 14

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.13985
• PDF: https://arxiv.org/pdf/2603.13985

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#LLM #SupervisedFineTuning #ReinforcementLearning #AI #MachineLearning
SNCE: Geometry-Aware Supervision for Scalable Discrete Image Generation

📝 Summary:
SNCE is a novel training objective for large-codebook discrete image generators. It supervises models with a soft categorical distribution over neighboring tokens, based on embedding proximity, instead of hard one-hot targets. This approach significantly improves convergence speed and overall gen...

🔹 Publication Date: Published on Mar 16

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.15150
• PDF: https://arxiv.org/pdf/2603.15150

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#ImageGeneration #DeepLearning #ComputerVision #GeometryAware #AIResearch
Mixture-of-Depths Attention

📝 Summary:
Mixture-of-Depths Attention MoDA addresses signal degradation in deep LLMs by allowing attention heads to access KV pairs from current and preceding layers. MoDA improves perplexity by 0.2 and downstream task performance by 2.11% with low overhead. It is a promising primitive for depth scaling.

🔹 Publication Date: Published on Mar 16

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.15619
• PDF: https://arxiv.org/pdf/2603.15619
• Project Page: https://github.com/hustvl/MoDA
• Github: https://github.com/hustvl/MoDA

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#LLM #AttentionMechanisms #DeepLearning #AIResearch #NLP
Training-free Detection of Generated Videos via Spatial-Temporal Likelihoods

📝 Summary:
STALL is a training-free, model-agnostic detector for generated videos. It jointly models spatial and temporal evidence from real-data statistics within a probabilistic framework. STALL consistently outperforms prior image and video-based baselines, improving reliable detection.

🔹 Publication Date: Published on Mar 16

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.15026
• PDF: https://arxiv.org/pdf/2603.15026
• Project Page: https://omerbenhayun.github.io/stall-video/
• Github: https://github.com/OmerBenHayun/stall-video

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#Deepfakes #VideoDetection #ComputerVision #AI #DigitalForensics
GlyphPrinter: Region-Grouped Direct Preference Optimization for Glyph-Accurate Visual Text Rendering

📝 Summary:
GlyphPrinter improves visual text rendering by addressing glyph accuracy. It introduces Region-Grouped DPO R-GDPO with region-level preferences from the GlyphCorrector dataset, significantly enhancing precision. This outperforms existing methods.

🔹 Publication Date: Published on Mar 16

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.15616
• PDF: https://arxiv.org/pdf/2603.15616
• Project Page: https://henghuiding.com/GlyphPrinter/
• Github: https://github.com/FudanCVL/GlyphPrinter

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#GlyphRendering #DeepLearning #ComputerVision #AIResearch #TextRendering
Paper2Poster: Towards Multimodal Poster Automation from Scientific Papers

📝 Summary:
A benchmark and metric suite for poster generation evaluates visual quality, coherence, and content accuracy, leading to a multi-agent pipeline that outperforms existing models with reduced computatio...

🔹 Publication Date: Published on May 27, 2025

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2505.21497
• PDF: https://arxiv.org/pdf/2505.21497
• Project Page: https://paper2poster.github.io/
• Github: https://paper2poster.github.io/

Datasets citing this paper:
https://huggingface.co/datasets/Paper2Poster/Paper2Poster

Spaces citing this paper:
https://huggingface.co/spaces/KevinQHLin/Paper2Poster
https://huggingface.co/spaces/camel-ai/Paper2Poster
https://huggingface.co/spaces/wangrongsheng/Paper2Poster

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research
Learning Latent Proxies for Controllable Single-Image Relighting

📝 Summary:
Single-image relighting is challenging due to unobserved geometry and materials. LightCtrl introduces a diffusion model guided by sparse, physically meaningful cues from a latent proxy encoder and lighting-aware masks. This enables photometrically faithful relighting with accurate control, outper...

🔹 Publication Date: Published on Mar 16

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.15555
• PDF: https://arxiv.org/pdf/2603.15555

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#ImageRelighting #DiffusionModels #ComputerVision #DeepLearning #AIResearch
Efficient Document Parsing via Parallel Token Prediction

📝 Summary:
PTP is a novel method to accelerate document parsing by overcoming slow autoregressive decoding in VLMs. It enables parallel token generation using learnable tokens, significantly boosting speed 1.6x-2.2x while reducing hallucinations and showing strong generalization.

🔹 Publication Date: Published on Mar 16

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.15206
• PDF: https://arxiv.org/pdf/2603.15206

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#DocumentParsing #VLMs #ParallelProcessing #AIEfficiency #NLP