ML Research Hub
32.5K subscribers
6.12K photos
404 videos
24 files
6.64K links
Advancing research in Machine Learning – practical insights, tools, and techniques for researchers.

Admin: @HusseinSheikho || @Hussein_Sheikho
Download Telegram
TrajLoom: Dense Future Trajectory Generation from Video

📝 Summary:
TrajLoom is a new framework for predicting dense future motion trajectories in videos. It uses grid-anchor encoding, a VAE for a compact latent space, and flow matching to generate realistic future motion. The method significantly extends prediction horizons and improves motion realism.

🔹 Publication Date: Published on Mar 23

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.22606
• PDF: https://arxiv.org/pdf/2603.22606
• Project Page: https://trajloom.github.io/
• Github: https://github.com/zewei-Zhang/TrajLoom

🔹 Models citing this paper:
https://huggingface.co/zeweizhang/TrajLoom

Datasets citing this paper:
https://huggingface.co/datasets/zeweizhang/TrajLoomDatasets

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research
AgentSLR: Automating Systematic Literature Reviews in Epidemiology with Agentic AI

📝 Summary:
Large language models can automate systematic literature reviews with human-level performance while reducing review time from weeks to hours. AI-generated summary Systematic literature reviews are ess...

🔹 Publication Date: Published on Mar 20

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.22327
• PDF: https://arxiv.org/pdf/2603.22327
• Project Page: https://oxrml.com/agent-slr/
• Github: https://github.com/OxRML/AgentSLR

Datasets citing this paper:
https://huggingface.co/datasets/OxRML/AgentSLR

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research
From Static Templates to Dynamic Runtime Graphs: A Survey of Workflow Optimization for LLM Agents

📝 Summary:
LLM-based systems use executable workflows that interleave various computational components, with recent approaches organized by workflow structure determination timing and optimization dimensions. AI...

🔹 Publication Date: Published on Mar 23

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.22386
• PDF: https://arxiv.org/pdf/2603.22386
• Github: https://github.com/IBM/awesome-agentic-workflow-optimization

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research
PEARL: Personalized Streaming Video Understanding Model

📝 Summary:
Personalized streaming video understanding addresses real-time visual input processing with precise temporal annotations, enabling interactive AI assistants through a new benchmark and plug-and-play s...

🔹 Publication Date: Published on Mar 20

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.20422
• PDF: https://arxiv.org/pdf/2603.20422
• Github: https://github.com/Yuanhong-Zheng/PEARL

Datasets citing this paper:
https://huggingface.co/datasets/zyh200727/PEARL-Data

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research
This media is not supported in your browser
VIEW IN TELEGRAM
WildWorld: A Large-Scale Dataset for Dynamic World Modeling with Actions and Explicit State toward Generative ARPG

📝 Summary:
WildWorld is a large-scale dataset for action-conditioned world modeling that provides explicit state annotations from a photorealistic game, enabling better understanding of latent-state dynamics and...

🔹 Publication Date: Published on Mar 24

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.23497
• PDF: https://arxiv.org/pdf/2603.23497
• Project Page: https://shandaai.github.io/wildworld-project/

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research
Rethinking Token-Level Policy Optimization for Multimodal Chain-of-Thought

📝 Summary:
Researchers developed a token-level reinforcement learning method called PEPO that improves multimodal chain-of-thought reasoning by distinguishing visual grounding from inference through perception-e...

🔹 Publication Date: Published on Mar 24

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.22847
• PDF: https://arxiv.org/pdf/2603.22847
• Github: https://github.com/xzxxntxdy/PEPO

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research
UniGRPO: Unified Policy Optimization for Reasoning-Driven Visual Generation

📝 Summary:
A unified reinforcement learning framework is proposed for interleaved text and image generation, using GRPO and FlowGRPO with modifications to enable scalable multi-round generation. AI-generated sum...

🔹 Publication Date: Published on Mar 24

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.23500
• PDF: https://arxiv.org/pdf/2603.23500

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research
SpecEyes: Accelerating Agentic Multimodal LLMs via Speculative Perception and Planning

📝 Summary:
SpecEyes accelerates agentic multimodal large language models by using a lightweight speculative planner with cognitive gating and heterogeneous parallel processing to reduce latency and improve throu...

🔹 Publication Date: Published on Mar 24

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.23483
• PDF: https://arxiv.org/pdf/2603.23483
• Github: https://github.com/MAC-AutoML/SpecEyes

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research
MultiBind: A Benchmark for Attribute Misbinding in Multi-Subject Generation

📝 Summary:
A new benchmark and evaluation method for multi-subject image generation that identifies and analyzes cross-subject attribute misbinding failures not detected by traditional metrics. AI-generated summ...

🔹 Publication Date: Published on Mar 23

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.21937
• PDF: https://arxiv.org/pdf/2603.21937

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research
ABot-PhysWorld: Interactive World Foundation Model for Robotic Manipulation with Physics Alignment

📝 Summary:
ABot-PhysWorld is a 14B Diffusion Transformer model that generates physically plausible videos through physics-aware training and evaluation on a new benchmark. AI-generated summary Video-based world ...

🔹 Publication Date: Published on Mar 24

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.23376
• PDF: https://arxiv.org/pdf/2603.23376
• Github: https://github.com/amap-cvlab/ABot-PhysWorld

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research
This media is not supported in your browser
VIEW IN TELEGRAM
Attend Before Attention: Efficient and Scalable Video Understanding via Autoregressive Gazing

📝 Summary:
AutoGaze is a lightweight module that reduces redundant video patches before processing by vision transformers or multi-modal large language models, enabling efficient processing of long, high-resolut...

🔹 Publication Date: Published on Mar 12

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.12254
• PDF: https://arxiv.org/pdf/2603.12254
• Project Page: https://autogaze.github.io/
• Github: https://github.com/NVlabs/AutoGaze

🔹 Models citing this paper:
https://huggingface.co/nvidia/NVILA-8B-HD-Video
https://huggingface.co/nvidia/AutoGaze

Datasets citing this paper:
https://huggingface.co/datasets/bfshi/HLVid

Spaces citing this paper:
https://huggingface.co/spaces/bfshi/AutoGaze

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research
Ego2Web: A Web Agent Benchmark Grounded in Egocentric Videos

📝 Summary:
Ego2Web introduces the first benchmark bridging egocentric video perception and web agent execution, enabling evaluation of AI agents that can perceive physical surroundings and perform online tasks t...

🔹 Publication Date: Published on Mar 23

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.22529
• PDF: https://arxiv.org/pdf/2603.22529
• Project Page: https://ego2web.github.io/
• Github: https://ego2web.github.io/

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research
MinerU-Diffusion: Rethinking Document OCR as Inverse Rendering via Diffusion Decoding

📝 Summary:
MinerU-Diffusion is a diffusion-based framework that replaces autoregressive decoding with parallel diffusion denoising for document OCR, improving robustness and decoding speed. AI-generated summary ...

🔹 Publication Date: Published on Mar 23

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.22458
• PDF: https://arxiv.org/pdf/2603.22458

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research
Sparse but Critical: A Token-Level Analysis of Distributional Shifts in RLVR Fine-Tuning of LLMs

📝 Summary:
Reinforcement learning with verifiable rewards induces sparse, targeted changes in token distributions that can be systematically analyzed through distributional shifts and cross-sampling intervention...

🔹 Publication Date: Published on Mar 23

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.22446
• PDF: https://arxiv.org/pdf/2603.22446
• Project Page: https://qwen-pilot.notion.site/rlvr-theseus

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research
This media is not supported in your browser
VIEW IN TELEGRAM
RealMaster: Lifting Rendered Scenes into Photorealistic Video

📝 Summary:
RealMaster combines video diffusion models with 3D engine outputs to generate photorealistic videos that maintain geometric accuracy and scene consistency through paired training and IC-LoRA distillat...

🔹 Publication Date: Published on Mar 24

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.23462
• PDF: https://arxiv.org/pdf/2603.23462
• Project Page: https://danacohen95.github.io/RealMaster/

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research
Uncertainty-guided Compositional Alignment with Part-to-Whole Semantic Representativeness in Hyperbolic Vision-Language Models

📝 Summary:
Hyperbolic vision-language models are enhanced through uncertainty-guided compositional alignment that improves hierarchical structure representation and multi-object scene understanding. AI-generated...

🔹 Publication Date: Published on Mar 23

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.22042
• PDF: https://arxiv.org/pdf/2603.22042
• Project Page: https://jeeit17.github.io/UNCHA-project_page/
• Github: https://github.com/jeeit17/UNCHA

🔹 Models citing this paper:
https://huggingface.co/hayeonkim/uncha

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research
Media is too big
VIEW IN TELEGRAM
SIMART: Decomposing Monolithic Meshes into Sim-ready Articulated Assets via MLLM

📝 Summary:
SIMART is a unified MLLM that generates sim-ready articulated 3D assets by jointly decomposing parts and predicting kinematics. Its Sparse 3D VQ-VAE significantly reduces 3D token overhead, enabling high-fidelity multi-part assemblies for physics simulation.

🔹 Publication Date: Published on Mar 24

🔹 Paper Links:
• arXiv Page: https://arxiv.org/pdf/2603.23386
• PDF: https://arxiv.org/pdf/2603.23386
• Project Page: https://simart-mllm.github.io/
• Github: https://simart-mllm.github.io/

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research
Reasoning or Rhetoric? An Empirical Analysis of Moral Reasoning Explanations in Large Language Models

📝 Summary:
Large language models exhibit post-conventional moral reasoning patterns inconsistent with human developmental trajectories, showing systematic logical incoherence and rhetorical sophistication withou...

🔹 Publication Date: Published on Mar 23

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.21854
• PDF: https://arxiv.org/pdf/2603.21854

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research
1
VTAM: Video-Tactile-Action Models for Complex Physical Interaction Beyond VLAs

📝 Summary:
Video-Action Models struggle in contact-rich tasks as vision alone lacks fine force details. The Video-Tactile Action Model VTAM integrates tactile perception with visual streams via multimodal fusion. VTAM significantly improves contact-rich manipulation by correcting visual errors, enabling rob...

🔹 Publication Date: Published on Mar 24

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.23481
• PDF: https://arxiv.org/pdf/2603.23481
• Project Page: https://plan-lab.github.io/projects/vtam

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research
Session Risk Memory (SRM): Temporal Authorization for Deterministic Pre-Execution Safety Gates

📝 Summary:
Session Risk Memory SRM enhances authorization by evaluating agent behavior over time, addressing distributed attacks. It uses semantic centroids and risk accumulation to achieve perfect detection with zero false positives, eliminating issues with stateless systems.

🔹 Publication Date: Published on Mar 22

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.22350
• PDF: https://arxiv.org/pdf/2603.22350

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#Cybersecurity #TemporalAuthorization #DistributedSystems #BehavioralAnalytics #RiskDetection
Media is too big
VIEW IN TELEGRAM
2Xplat: Two Experts Are Better Than One Generalist

📝 Summary:
2Xplat proposes a two-expert architecture for pose-free 3D Gaussian Splatting. It explicitly separates geometry estimation from appearance synthesis, outperforming unified methods and matching state-of-the-art performance with less training.

🔹 Publication Date: Published on Mar 22

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.21064
• PDF: https://arxiv.org/pdf/2603.21064
• Project Page: https://hwasikjeong.github.io/2Xplat
• Github: https://github.com/HwasikJeong/2Xplat

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#GaussianSplatting #3DReconstruction #ComputerVision #AI #DeepLearning