ML Research Hub
32.3K subscribers
6.45K photos
437 videos
24 files
7K links
Advancing research in Machine Learning – practical insights, tools, and techniques for researchers.

Admin: @HusseinSheikho || @Hussein_Sheikho
Download Telegram
Media is too big
VIEW IN TELEGRAM
OmniRoam: World Wandering via Long-Horizon Panoramic Video Generation

📝 Summary:
OmniRoam generates long-horizon panoramic videos using a two-stage approach for improved scene completeness and consistency. It first previews a trajectory-controlled video, then refines and extends it to high-resolution, long-range panoramas, enabling high-fidelity world wandering.

🔹 Publication Date: Published on Mar 31

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.30045
• PDF: https://arxiv.org/pdf/2603.30045
• Project Page: https://yuheng.ink/project-page/omniroam/
• Github: https://github.com/yuhengliu02/OmniRoam

🔹 Models citing this paper:
https://huggingface.co/Yuheng02/OmniRoam

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research
Can a Single Model Master Both Multi-turn Conversations and Tool Use? CALM: A Unified Conversational Agentic Language Model

📝 Summary:
CALM is a unified model bridging the gap between multi-turn conversations and tool use in language agents. Trained on a new multi-task dataset CALM-IT, it integrates both capabilities. CALM outperforms specialized models, including GPT-4o, across various benchmarks.

🔹 Publication Date: Published on Feb 12, 2025

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2502.08820
• PDF: https://arxiv.org/pdf/2502.08820
• Project Page: https://emrecanacikgoz.github.io/CoALM/
• Github: https://github.com/oumi-ai/oumi

🔹 Models citing this paper:
https://huggingface.co/uiuc-convai/CoALM-8B
https://huggingface.co/uiuc-convai/CoALM-405B
https://huggingface.co/uiuc-convai/CoALM-70B

Datasets citing this paper:
https://huggingface.co/datasets/uiuc-convai/CoALM-IT

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research
Terminal Agents Suffice for Enterprise Automation

📝 Summary:
Simple terminal-based coding agents interacting directly with platform APIs, powered by foundation models, are highly effective for enterprise automation. These low-level agents match or outperform complex tool-augmented systems, demonstrating that elaborate agent architectures are often unnecess...

🔹 Publication Date: Published on Mar 31

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2604.00073
• PDF: https://arxiv.org/pdf/2604.00073

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research
Think, Act, Build: An Agentic Framework with Vision Language Models for Zero-Shot 3D Visual Grounding

📝 Summary:
A dynamic agentic framework called TAB addresses 3D visual grounding by decoupling spatial semantics resolution from 3D structure instantiation through 2D VLMs and multi-view geometry, achieving super...

🔹 Publication Date: Published on Apr 1

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2604.00528
• PDF: https://arxiv.org/pdf/2604.00528
• Github: https://github.com/WHB139426/TAB-Agent

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research
This media is not supported in your browser
VIEW IN TELEGRAM
GaussianGPT: Towards Autoregressive 3D Gaussian Scene Generation

📝 Summary:
GaussianGPT uses a transformer-based autoregressive approach with 3D rotary positional embeddings to generate 3D scenes by predicting Gaussian primitives, offering advantages over diffusion methods in...

🔹 Publication Date: Published on Mar 27

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.26661
• PDF: https://arxiv.org/pdf/2603.26661
• Project Page: https://nicolasvonluetzow.github.io/GaussianGPT/

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research
Universal YOCO for Efficient Depth Scaling

📝 Summary:
Universal YOCO YOCO-U merges YOCO architecture with recursive computation for efficient LLM depth scaling. It uses iterative processing in shallow attention layers, offering constant KV cache and better token utility.

🔹 Publication Date: Published on Apr 1

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2604.01220
• PDF: https://arxiv.org/pdf/2604.01220

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research
PerceptionComp: A Video Benchmark for Complex Perception-Centric Reasoning

📝 Summary:
PerceptionComp is a new video benchmark for complex, long-horizon perception-centric reasoning. It requires multiple temporal visual evidence and compositional logic. Current AI models struggle significantly, highlighting a major bottleneck in perceptual video reasoning.

🔹 Publication Date: Published on Mar 27

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.26653
• PDF: https://arxiv.org/pdf/2603.26653
• Project Page: https://perceptioncomp.github.io/
• Github: https://github.com/hrinnnn/PerceptionComp

Datasets citing this paper:
https://huggingface.co/datasets/hrinnnn/PerceptionComp

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research
ViGoR-Bench: How Far Are Visual Generative Models From Zero-Shot Visual Reasoners?

📝 Summary:
ViGoR benchmark addresses limitations in current AIGC evaluation by introducing a comprehensive framework for assessing visual generative reasoning across multiple modalities and cognitive dimensions....

🔹 Publication Date: Published on Mar 26

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.25823
• PDF: https://arxiv.org/pdf/2603.25823
• Project Page: https://vincenthancoder.github.io/ViGoR-Bench/
• Github: https://github.com/VincentHancoder/ViGoR-Bench-Eval

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research
Embarrassingly Simple Self-Distillation Improves Code Generation

📝 Summary:
Simple self-distillation improves code generation in large language models by fine-tuning on model-generated samples, effectively addressing precision-exploration trade-offs in decoding. AI-generated ...

🔹 Publication Date: Published on Apr 1

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2604.01193
• PDF: https://arxiv.org/pdf/2604.01193
• Github: https://github.com/apple/ml-ssd

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research
Revision or Re-Solving? Decomposing Second-Pass Gains in Multi-LLM Pipelines

📝 Summary:
Multi-LLM revision pipelines' effectiveness varies by task structure and draft quality, with gains decomposing into re-solving, scaffold, and content components rather than representing uniform error ...

🔹 Publication Date: Published on Apr 1

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2604.01029
• PDF: https://arxiv.org/pdf/2604.01029

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research
MMaDA-VLA: Large Diffusion Vision-Language-Action Model with Unified Multi-Modal Instruction and Generation

📝 Summary:
A native discrete diffusion framework unifies multi-modal understanding and generation for robotic manipulation, enabling parallel action and visual outcome prediction with improved long-horizon consi...

🔹 Publication Date: Published on Mar 26

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.25406
• PDF: https://arxiv.org/pdf/2603.25406
• Project Page: https://yliu-cs.github.io/MMaDA-VLA
• Github: https://github.com/yliu-cs/MMaDA-VLA

🔹 Models citing this paper:
https://huggingface.co/yliu-cs/MMaDA-VLA

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research
HippoCamp: Benchmarking Contextual Agents on Personal Computers

📝 Summary:
HippoCamp is a new multimodal benchmark evaluating agents on massive personal file management. It exposes significant performance gaps in current models for long-horizon retrieval and cross-modal reasoning in user-centric environments, revealing bottlenecks in multimodal perception.

🔹 Publication Date: Published on Apr 1

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2604.01221
• PDF: https://arxiv.org/pdf/2604.01221
• Project Page: https://hippocamp-ai.github.io/
• Github: https://github.com/Savannah-yz/HippoCamp

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research
MiroEval: Benchmarking Multimodal Deep Research Agents in Process and Outcome

📝 Summary:
MiroEval is a new benchmark for deep research systems, addressing limitations of existing evaluations. It assesses adaptive synthesis, factuality, and process quality across real-user text and multimodal tasks, showing process quality predicts outcomes and multimodal tasks are very challenging.

🔹 Publication Date: Published on Mar 30

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.28407
• PDF: https://arxiv.org/pdf/2603.28407
• Project Page: https://miroeval-ai.github.io/website/
• Github: https://github.com/MiroMindAI/MiroEval

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research
QuitoBench: A High-Quality Open Time Series Forecasting Benchmark

📝 Summary:
QuitoBench addresses the lack of large-scale time series benchmarks by introducing a regime-balanced dataset with eight TSF regimes, revealing that foundation models outperform deep learning at long c...

🔹 Publication Date: Published on Mar 27

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.26017
• PDF: https://arxiv.org/pdf/2603.26017

Datasets citing this paper:
https://huggingface.co/datasets/hq-bench/quitobench
https://huggingface.co/datasets/hq-bench/quito-corpus

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#TimeSeriesForecasting #DataScience #MachineLearning #AI #QuitoBench
Vision2Web: A Hierarchical Benchmark for Visual Website Development with Agent Verification

📝 Summary:
Vision2Web presents a comprehensive benchmark for visual website development tasks and evaluates coding agents across static UI generation, interactive frontend reproduction, and full-stack developmen...

🔹 Publication Date: Published on Mar 27

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.26648
• PDF: https://arxiv.org/pdf/2603.26648
• Project Page: https://vision2web-bench.github.io/
• Github: https://github.com/zai-org/Vision2Web

Datasets citing this paper:
https://huggingface.co/datasets/zai-org/Vision2Web

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research
Paper Reconstruction Evaluation: Evaluating Presentation and Hallucination in AI-written Papers

📝 Summary:
A systematic evaluation framework called PaperRecon is proposed to assess AI-generated papers by separating quality assessment into presentation and hallucination dimensions using a benchmark of 51 re...

🔹 Publication Date: Published on Apr 1

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2604.01128
• PDF: https://arxiv.org/pdf/2604.01128
• Project Page: https://agent4science-utokyo.github.io/PaperRecon_HP/
• Github: https://github.com/Agent4Science-UTokyo/PaperRecon

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research
Proactive Agent Research Environment: Simulating Active Users to Evaluate Proactive Assistants

📝 Summary:
A framework for proactive agent research is introduced that models applications as finite state machines to enable realistic user simulation and task execution across multiple digital environments. AI...

🔹 Publication Date: Published on Apr 1

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2604.00842
• PDF: https://arxiv.org/pdf/2604.00842
• Github: https://github.com/deepakn97/pare

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research
Benchmarking and Mechanistic Analysis of Vision-Language Models for Cross-Depiction Assembly Instruction Alignment

📝 Summary:
Vision Language Models struggle with aligning assembly diagrams and video feeds due to a depiction gap, with findings indicating visual encoding as the primary target for improving cross-depiction rob...

🔹 Publication Date: Published on Apr 1

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2604.00913
• PDF: https://arxiv.org/pdf/2604.00913
• Project Page: https://ryenhails.github.io/IKEA-Bench/
• Github: https://ryenhails.github.io/IKEA-Bench/

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research
Understand and Accelerate Memory Processing Pipeline for Disaggregated LLM Inference

📝 Summary:
LLM inference faces significant memory processing overhead. This paper proposes using heterogeneous GPU-FPGA systems to accelerate these operations by offloading memory-bounded tasks to FPGAs. This achieves 1.04-2.2x speedup and 1.11-4.7x energy savings over GPU baselines, proving heterogeneous s...

🔹 Publication Date: Published on Mar 30

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.29002
• PDF: https://arxiv.org/pdf/2603.29002

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#LLMInference #FPGA #HeterogeneousComputing #HardwareAcceleration #SystemArchitecture
UniMixer: A Unified Architecture for Scaling Laws in Recommendation Systems

📝 Summary:
UniMixer is a unified architecture for recommendation systems that improves scaling efficiency. It uses a generalized parameterized token mixing module to optimize mixing patterns and connect attention, TokenMixer, and factorization-machine methods. A lightweight version boosts performance further.

🔹 Publication Date: Published on Apr 1

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2604.00590
• PDF: https://arxiv.org/pdf/2604.00590

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research