ML Research Hub
32.4K subscribers
6.14K photos
404 videos
24 files
6.66K links
Advancing research in Machine Learning – practical insights, tools, and techniques for researchers.

Admin: @HusseinSheikho || @Hussein_Sheikho
Download Telegram
This media is not supported in your browser
VIEW IN TELEGRAM
GameplayQA: A Benchmarking Framework for Decision-Dense POV-Synced Multi-Video Understanding of 3D Virtual Agents

📝 Summary:
GameplayQA is a framework evaluating multimodal LLMs in 3D multi-agent environments using densely annotated gameplay videos and diagnostic QA. It reveals a significant performance gap between current MLLMs and humans, particularly in temporal grounding and agent attribution. This emphasizes the n...

🔹 Publication Date: Published on Mar 25

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.24329
• PDF: https://arxiv.org/pdf/2603.24329
• Project Page: https://hats-ict.github.io/gameplayqa/

Datasets citing this paper:
https://huggingface.co/datasets/wangyz1999/GameplayQA

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research
When Models Judge Themselves: Unsupervised Self-Evolution for Multimodal Reasoning

📝 Summary:
This paper proposes an unsupervised self-evolution framework for multimodal reasoning. It uses self-consistency and group-relative policy optimization to improve performance without labeled data or external models. This method consistently improves reasoning, offering a scalable path for self-evo...

🔹 Publication Date: Published on Mar 22

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.21289
• PDF: https://arxiv.org/pdf/2603.21289
• Project Page: https://dingwu1021.github.io/SelfJudge/
• Github: https://github.com/OPPO-Mente-Lab/LLM-Self-Judge

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research
This media is not supported in your browser
VIEW IN TELEGRAM
Toward Physically Consistent Driving Video World Models under Challenging Trajectories

📝 Summary:
PhyGenesis is a world model that generates high-fidelity driving videos with physical consistency by transforming invalid trajectories into plausible conditions and using a physics-enhanced video gene...

🔹 Publication Date: Published on Mar 25

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.24506
• PDF: https://arxiv.org/pdf/2603.24506

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research
UI-Voyager: A Self-Evolving GUI Agent Learning via Failed Experience

📝 Summary:
A two-stage self-evolving mobile GUI agent named UI-Voyager is proposed, featuring rejection fine-tuning and group relative self-distillation to improve efficiency and performance in GUI automation ta...

🔹 Publication Date: Published on Mar 25

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.24533
• PDF: https://arxiv.org/pdf/2603.24533
• Github: https://github.com/ui-voyager/UI-Voyager

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research
OmniWeaving: Towards Unified Video Generation with Free-form Composition and Reasoning

📝 Summary:
OmniWeaving is an open-source video generation model that unifies multimodal inputs and complex reasoning capabilities through large-scale pretraining and intelligent agent inference. AI-generated sum...

🔹 Publication Date: Published on Mar 25

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.24458
• PDF: https://arxiv.org/pdf/2603.24458
• Project Page: https://omniweaving.github.io/

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research
This media is not supported in your browser
VIEW IN TELEGRAM
CUA-Suite: Massive Human-annotated Video Demonstrations for Computer-Use Agents

📝 Summary:
CUA-Suite introduces a large-scale ecosystem of expert video demonstrations and annotations for computer-use agents, providing continuous screen recordings and detailed reasoning annotations to advanc...

🔹 Publication Date: Published on Mar 25

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.24440
• PDF: https://arxiv.org/pdf/2603.24440

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research
This media is not supported in your browser
VIEW IN TELEGRAM
4DGS360: 360° Gaussian Reconstruction of Dynamic Objects from a Single Video

📝 Summary:
4DGS360 presents a diffusion-free approach for 360° dynamic object reconstruction using 3D-native initialization and a 3D tracker called AnchorTAP3D to improve geometric consistency and handle occlusi...

🔹 Publication Date: Published on Mar 23

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.21618
• PDF: https://arxiv.org/pdf/2603.21618
• Project Page: https://jaewon040.github.io/4dgs360/

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research
Why Does Self-Distillation (Sometimes) Degrade the Reasoning Capability of LLMs?

📝 Summary:
Self-distillation in large language models can degrade mathematical reasoning performance by suppressing uncertainty expression, particularly affecting out-of-distribution tasks. AI-generated summary ...

🔹 Publication Date: Published on Mar 25

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.24472
• PDF: https://arxiv.org/pdf/2603.24472
• Project Page: https://beanie00.notion.site/why-does-self-distillation-degrade-reasoning
• Github: https://github.com/beanie00/self-distillation-analysis

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research
T-MAP: Red-Teaming LLM Agents with Trajectory-aware Evolutionary Search

📝 Summary:
T-MAP, a trajectory-aware evolutionary search method, discovers adversarial prompts that bypass safety measures and achieve harmful outcomes through tool interactions in LLM agents. AI-generated summa...

🔹 Publication Date: Published on Mar 21

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.22341
• PDF: https://arxiv.org/pdf/2603.22341
• Github: https://github.com/pwnhyo/T-MAP

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research
This media is not supported in your browser
VIEW IN TELEGRAM
UniFunc3D: Unified Active Spatial-Temporal Grounding for 3D Functionality Segmentation

📝 Summary:
UniFunc3D enables 3D scene functionality segmentation by treating multimodal large language models as active observers that perform joint semantic, temporal, and spatial reasoning through adaptive fra...

🔹 Publication Date: Published on Mar 24

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.23478
• PDF: https://arxiv.org/pdf/2603.23478
• Project Page: https://jiaying.link/unifunc3d/

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research
EVA: Efficient Reinforcement Learning for End-to-End Video Agent

📝 Summary:
EVA is an RL framework enabling efficient, adaptive video understanding by autonomously deciding what and how to watch. It uses iterative planning to handle long video sequences. EVA significantly outperforms existing MLLM and adaptive agent methods on multiple video benchmarks.

🔹 Publication Date: Published on Mar 24

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.22918
• PDF: https://arxiv.org/pdf/2603.22918
• Project Page: https://huggingface.co/WRHC/EfficientVideoAgent/
• Github: https://github.com/wangruohui/EfficientVideoAgent

🔹 Models citing this paper:
https://huggingface.co/WRHC/EfficientVideoAgent

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research
PLDR-LLMs Reason At Self-Organized Criticality

📝 Summary:
PLDR-LLMs exhibit reasoning capabilities at self-organized criticality through metastable steady states that mirror second-order phase transitions, enabling generalization without benchmark evaluation...

🔹 Publication Date: Published on Mar 12

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.23539
• PDF: https://arxiv.org/pdf/2603.23539
• Project Page: https://huggingface.co/fromthesky
• Github: https://github.com/burcgokden/PLDR-LLM-Self-Organized-Criticality

🔹 Models citing this paper:
https://huggingface.co/fromthesky/PLDR-LLM-v51-SOC-110M-1
https://huggingface.co/fromthesky/PLDR-LLM-v51-SOC-110M-2
https://huggingface.co/fromthesky/PLDR-LLM-v51-SOC-110M-3

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research
StreamingClaw Technical Report

📝 Summary:
StreamingClaw is a unified framework for real-time streaming video understanding and embodied intelligence. It integrates real-time reasoning, multimodal long-term memory, and proactive interaction, enabling direct control of the physical world.

🔹 Publication Date: Published on Mar 23

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.22120
• PDF: https://arxiv.org/pdf/2603.22120

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#EmbodiedAI #VideoUnderstanding #RealTimeAI #Robotics #MultimodalAI
Unleashing Spatial Reasoning in Multimodal Large Language Models via Textual Representation Guided Reasoning

📝 Summary:
TRACE is a prompting method that enables MLLMs to perform 3D spatial reasoning by generating text-based representations of video environments. This improves spatial question answering and consistently outperforms prior strategies.

🔹 Publication Date: Published on Mar 24

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.23404
• PDF: https://arxiv.org/pdf/2603.23404

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#SpatialReasoning #MLLMs #AI #PromptEngineering #ComputerVision
LagerNVS: Latent Geometry for Fully Neural Real-time Novel View Synthesis

📝 Summary:
LagerNVS is a neural network for novel view synthesis that uses strong 3D inductive biases. It achieves this by initializing its encoder from a pre-trained 3D reconstruction network, enabling state-of-the-art, real-time NVS even with unknown cameras and in-the-wild data.

🔹 Publication Date: Published on Mar 20

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.20176
• PDF: https://arxiv.org/pdf/2603.20176
• Project Page: https://szymanowiczs.github.io/lagernvs
• Github: https://github.com/facebookresearch/lagernvs

🔹 Models citing this paper:
https://huggingface.co/facebook/lagernvs_general_512
https://huggingface.co/facebook/lagernvs_re10k_2v_256
https://huggingface.co/facebook/lagernvs_dl3dv_2-6_v_256

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#NovelViewSynthesis #NeuralNetworks #3DReconstruction #ComputerVision #DeepLearning
6Bit-Diffusion: Inference-Time Mixed-Precision Quantization for Video Diffusion Models

📝 Summary:
This paper introduces a mixed-precision quantization framework for video diffusion transformers. It dynamically allocates NVFP4/INT8 based on layer volatility and uses Temporal Delta Cache to skip computations, significantly reducing memory and cost while preserving quality.

🔹 Publication Date: Published on Mar 19

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.18742
• PDF: https://arxiv.org/pdf/2603.18742

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#Quantization #DiffusionModels #VideoAI #DeepLearning #ModelOptimization
The AI Scientist-v2: Workshop-Level Automated Scientific Discovery via Agentic Tree Search

📝 Summary:
The AI Scientist-v2 is an agentic system capable of autonomous scientific discovery, from hypothesis to manuscript. It produced the first fully AI-generated paper accepted by a peer-reviewed workshop, highlighting AI's growing research capabilities.

🔹 Publication Date: Published on Apr 10, 2025

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2504.08066
• PDF: https://arxiv.org/pdf/2504.08066
• Github: https://github.com/SakanaAI/AI-Scientist-v2

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#AI #ScientificDiscovery #AgenticAI #AutonomousResearch #FutureOfScience
Qworld: Question-Specific Evaluation Criteria for LLMs

📝 Summary:
Qworld is a new method that generates question-specific evaluation criteria for LLMs using recursive expansion trees. It decomposes questions into fine-grained criteria, enabling more insightful and granular assessment of LLM capabilities by adapting to each question's context. This approach reve...

🔹 Publication Date: Published on Mar 6

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.23522
• PDF: https://arxiv.org/pdf/2603.23522
• Project Page: https://qworld.openscientist.ai/
• Github: https://github.com/mims-harvard/qworld

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#LLMEvaluation #LargeLanguageModels #AIResearch #NLP #MachineLearning
1
The AI Scientist: Towards Fully Automated Open-Ended Scientific Discovery

📝 Summary:
AI Scientist is an LLM system for automated scientific discovery. It handles ideas, experiments, papers, and simulated review. This system produces high-quality research for under $15, exceeding top conference standards.

🔹 Publication Date: Published on Aug 12, 2024

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2402.00854
• PDF: https://arxiv.org/pdf/2408.06292
• Github: https://github.com/ExtensityAI/benchmark/blob/main/src/evals/eval_computation_graphs.py#L551

🔹 Models citing this paper:
https://huggingface.co/pradachan/AI-Scientist
https://huggingface.co/priyanshmahant12/AI-Scientist-main

Spaces citing this paper:
https://huggingface.co/spaces/AUXteam/Critical_Code_Agent

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#AIScientist #AutomatedDiscovery #LLM #ScientificResearch #AIforScience
SpectralSplats: Robust Differentiable Tracking via Spectral Moment Supervision

📝 Summary:
SpectralSplats resolves vanishing gradients in 3D Gaussian Splatting tracking by optimizing in the frequency domain using spectral moments. This creates a global gradient basin of attraction, ensuring robust tracking even with severe misalignment. A frequency annealing schedule guides precise ali...

🔹 Publication Date: Published on Mar 25

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.24036
• PDF: https://arxiv.org/pdf/2603.24036
• Project Page: https://avigailco.github.io/SpectralSplats/

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#3DTracking #GaussianSplatting #ComputerVision #Optimization #DifferentiableRendering
Understanding the Challenges in Iterative Generative Optimization with LLMs

📝 Summary:
Generative optimization with LLMs is often brittle due to implicit design choices about artifact modification and learning evidence. These hidden decisions, such as starting artifact or batching, critically determine success across applications. Making these choices explicit is crucial for wider ...

🔹 Publication Date: Published on Mar 25

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.23994
• PDF: https://arxiv.org/pdf/2603.23994

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#LLMs #GenerativeAI #Optimization #AIResearch #MachineLearning