ML Research Hub
32.4K subscribers
6.15K photos
404 videos
24 files
6.66K links
Advancing research in Machine Learning – practical insights, tools, and techniques for researchers.

Admin: @HusseinSheikho || @Hussein_Sheikho
Download Telegram
This media is not supported in your browser
VIEW IN TELEGRAM
InfiniDepth: Arbitrary-Resolution and Fine-Grained Depth Estimation with Neural Implicit Fields

📝 Summary:
InfiniDepth introduces neural implicit fields for continuous 2D depth querying, overcoming limitations of discrete grid methods. This enables arbitrary-resolution and fine-grained depth estimation, achieving state-of-the-art performance, particularly in fine-detail regions and for novel view synt...

🔹 Publication Date: Published on Jan 6

🔹 Paper Links:
• arXiv Page: https://arxivexplained.com/papers/infinidepth-arbitrary-resolution-and-fine-grained-depth-estimation-with-neural-implicit-fields
• PDF: https://arxiv.org/pdf/2601.03252
• Project Page: https://zju3dv.github.io/InfiniDepth
• Github: https://zju3dv.github.io/InfiniDepth

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#DepthEstimation #NeuralImplicitFields #ComputerVision #AI #3DGraphics
1
STEM Agent: A Self-Adapting, Tool-Enabled, Extensible Architecture for Multi-Protocol AI Agent Systems

📝 Summary:
STEM Agent is a self-adapting, modular AI architecture. Inspired by biology, it dynamically differentiates components for diverse interaction protocols, tool integration, and user modeling, solving fixed framework limitations.

🔹 Publication Date: Published on Mar 22

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.22359
• PDF: https://arxiv.org/pdf/2603.22359

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#AIAgents #AIArchitecture #AdaptiveAI #ToolIntegration #AIResearch
Reconstruction-Guided Slot Curriculum: Addressing Object Over-Fragmentation in Video Object-Centric Learning

📝 Summary:
SlotCurri addresses video object over-fragmentation using a reconstruction-guided slot curriculum. It progressively allocates slots, employs a structure-aware loss for sharp boundaries, and uses cyclic inference for temporal consistency. This method significantly improves object decomposition.

🔹 Publication Date: Published on Mar 24

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.22758
• PDF: https://arxiv.org/pdf/2603.22758
• Github: https://github.com/wjun0830/SlotCurri

🔹 Models citing this paper:
https://huggingface.co/WJ0830/SlotCurri

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#VideoAI #ObjectCentricLearning #ComputerVision #DeepLearning #ObjectSegmentation
Logics-Parsing Technical Report

📝 Summary:
Logics-Parsing is an end-to-end LVLM enhanced with reinforcement learning to improve document parsing. It optimizes layout analysis and reading order inference, achieving state-of-the-art performance on diverse document types across a new benchmark.

🔹 Publication Date: Published on Sep 24, 2025

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2509.19760
• PDF: https://arxiv.org/pdf/2509.19760
• Github: https://github.com/alibaba/Logics-Parsing

🔹 Models citing this paper:
https://huggingface.co/Logics-MLLM/Logics-Parsing
https://huggingface.co/Mungert/Logics-Parsing-GGUF

Spaces citing this paper:
https://huggingface.co/spaces/prithivMLmods/VLM-Parsing

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research
One View Is Enough! Monocular Training for In-the-Wild Novel View Generation

📝 Summary:
OVIE enables monocular novel-view synthesis from single images by generating pseudo-target views via a geometric scaffold. This eliminates the need for multi-view supervision, allowing training on massive unpaired datasets. OVIE achieves superior zero-shot performance and is significantly faster ...

🔹 Publication Date: Published on Mar 24

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.23488
• PDF: https://arxiv.org/pdf/2603.23488
• Github: https://github.com/AdrienRR/ovie

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#NovelViewSynthesis #MonocularVision #ComputerVision #DeepLearning #3DVision
1
Fair splits flip the leaderboard: CHANRG reveals limited generalization in RNA secondary-structure prediction

📝 Summary:
The CHANRG benchmark reveals RNA foundation models achieve high held-out accuracy but lose significant robustness out-of-distribution. This new benchmark provides a stricter framework for evaluating RNA secondary structure prediction.

🔹 Publication Date: Published on Mar 20

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.22330
• PDF: https://arxiv.org/pdf/2603.22330
• Project Page: https://huggingface.co/datasets/multimolecule/chanrg
• Github: https://github.com/MultiMolecule/multimolecule

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#RNAstructure #MachineLearning #FoundationModels #Bioinformatics #ModelRobustness
1
CanViT: Toward Active-Vision Foundation Models

📝 Summary:
CanViT represents the first task- and policy-agnostic Active-Vision Foundation Model that efficiently processes visual scenes through sequential glimpses using a retinotopic Vision Transformer backbon...

🔹 Publication Date: Published on Mar 23

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.22570
• PDF: https://arxiv.org/pdf/2603.22570
• Github: https://github.com/m2b3/CanViT-PyTorch

🔹 Models citing this paper:
https://huggingface.co/canvit/canvitb16-add-vpe-pretrain-g128px-s512px-in21k-dv3b16-2026-02-02

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research
1
Abstraction as a Memory-Efficient Inductive Bias for Continual Learning

📝 Summary:
Abstraction-Augmented Training AAT improves continual learning by jointly optimizing concrete and abstract representations. This memory-efficient method captures latent structures, eliminating replay buffers. AAT performs comparably to experience replay with zero extra memory.

🔹 Publication Date: Published on Mar 17

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.17198
• PDF: https://arxiv.org/pdf/2603.17198

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research
1
Abstraction as a Memory-Efficient Inductive Bias for Continual Learning

📝 Summary:
Abstraction-Augmented Training AAT improves continual learning by jointly optimizing concrete and abstract representations. This memory-efficient method captures latent structures, eliminating replay buffers. AAT performs comparably to experience replay with zero extra memory.

🔹 Publication Date: Published on Mar 17

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.17198
• PDF: https://arxiv.org/pdf/2603.17198

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research
1
Can AI Agents Answer Your Data Questions? A Benchmark for Data Agents

📝 Summary:
A comprehensive benchmark evaluates enterprise data agents' ability to integrate and analyze multi-database data through natural language, revealing significant challenges in real-world applications. ...

🔹 Publication Date: Published on Mar 21

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.20576
• PDF: https://arxiv.org/pdf/2603.20576
• Project Page: https://ucbepic.github.io/DataAgentBench/
• Github: https://github.com/ucbepic/DataAgentBench

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research
1
SHAMISA: SHAped Modeling of Implicit Structural Associations for Self-supervised No-Reference Image Quality Assessment

📝 Summary:
SHAMISA is a self-supervised NR-IQA framework learning from unlabeled distorted images. It uses implicit structural associations and a compositional distortion engine to group images for training, achieving strong performance and generalization without human labels or contrastive losses.

🔹 Publication Date: Published on Mar 14

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.13669
• PDF: https://arxiv.org/pdf/2603.13669
• Github: https://github.com/Mahdi-Naseri/SHAMISA

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research
1
CarePilot: A Multi-Agent Framework for Long-Horizon Computer Task Automation in Healthcare

📝 Summary:
CarePilot is a multi-agent framework that uses actor-critic methods and dual-memory to automate complex, long-horizon tasks in healthcare. It addresses the limitations of existing models on the new CareFlow benchmark. CarePilot achieves state-of-the-art performance.

🔹 Publication Date: Published on Mar 25

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.24157
• PDF: https://arxiv.org/pdf/2603.24157
• Project Page: https://akashghosh.github.io/Care-Pilot/
• Github: https://github.com/AkashGhosh/CarePilot

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research
Can LLM Agents Be CFOs? A Benchmark for Resource Allocation in Dynamic Enterprise Environments

📝 Summary:
EnterpriseArena benchmark evaluates large language models on long-horizon enterprise resource allocation, revealing significant challenges in sustained decision-making under uncertainty. AI-generated ...

🔹 Publication Date: Published on Mar 24

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.23638
• PDF: https://arxiv.org/pdf/2603.23638

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research
This media is not supported in your browser
VIEW IN TELEGRAM
GameplayQA: A Benchmarking Framework for Decision-Dense POV-Synced Multi-Video Understanding of 3D Virtual Agents

📝 Summary:
GameplayQA is a framework evaluating multimodal LLMs in 3D multi-agent environments using densely annotated gameplay videos and diagnostic QA. It reveals a significant performance gap between current MLLMs and humans, particularly in temporal grounding and agent attribution. This emphasizes the n...

🔹 Publication Date: Published on Mar 25

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.24329
• PDF: https://arxiv.org/pdf/2603.24329
• Project Page: https://hats-ict.github.io/gameplayqa/

Datasets citing this paper:
https://huggingface.co/datasets/wangyz1999/GameplayQA

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research
When Models Judge Themselves: Unsupervised Self-Evolution for Multimodal Reasoning

📝 Summary:
This paper proposes an unsupervised self-evolution framework for multimodal reasoning. It uses self-consistency and group-relative policy optimization to improve performance without labeled data or external models. This method consistently improves reasoning, offering a scalable path for self-evo...

🔹 Publication Date: Published on Mar 22

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.21289
• PDF: https://arxiv.org/pdf/2603.21289
• Project Page: https://dingwu1021.github.io/SelfJudge/
• Github: https://github.com/OPPO-Mente-Lab/LLM-Self-Judge

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research
This media is not supported in your browser
VIEW IN TELEGRAM
Toward Physically Consistent Driving Video World Models under Challenging Trajectories

📝 Summary:
PhyGenesis is a world model that generates high-fidelity driving videos with physical consistency by transforming invalid trajectories into plausible conditions and using a physics-enhanced video gene...

🔹 Publication Date: Published on Mar 25

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.24506
• PDF: https://arxiv.org/pdf/2603.24506

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research
UI-Voyager: A Self-Evolving GUI Agent Learning via Failed Experience

📝 Summary:
A two-stage self-evolving mobile GUI agent named UI-Voyager is proposed, featuring rejection fine-tuning and group relative self-distillation to improve efficiency and performance in GUI automation ta...

🔹 Publication Date: Published on Mar 25

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.24533
• PDF: https://arxiv.org/pdf/2603.24533
• Github: https://github.com/ui-voyager/UI-Voyager

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research
OmniWeaving: Towards Unified Video Generation with Free-form Composition and Reasoning

📝 Summary:
OmniWeaving is an open-source video generation model that unifies multimodal inputs and complex reasoning capabilities through large-scale pretraining and intelligent agent inference. AI-generated sum...

🔹 Publication Date: Published on Mar 25

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.24458
• PDF: https://arxiv.org/pdf/2603.24458
• Project Page: https://omniweaving.github.io/

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research
This media is not supported in your browser
VIEW IN TELEGRAM
CUA-Suite: Massive Human-annotated Video Demonstrations for Computer-Use Agents

📝 Summary:
CUA-Suite introduces a large-scale ecosystem of expert video demonstrations and annotations for computer-use agents, providing continuous screen recordings and detailed reasoning annotations to advanc...

🔹 Publication Date: Published on Mar 25

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.24440
• PDF: https://arxiv.org/pdf/2603.24440

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research
This media is not supported in your browser
VIEW IN TELEGRAM
4DGS360: 360° Gaussian Reconstruction of Dynamic Objects from a Single Video

📝 Summary:
4DGS360 presents a diffusion-free approach for 360° dynamic object reconstruction using 3D-native initialization and a 3D tracker called AnchorTAP3D to improve geometric consistency and handle occlusi...

🔹 Publication Date: Published on Mar 23

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.21618
• PDF: https://arxiv.org/pdf/2603.21618
• Project Page: https://jaewon040.github.io/4dgs360/

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research
Why Does Self-Distillation (Sometimes) Degrade the Reasoning Capability of LLMs?

📝 Summary:
Self-distillation in large language models can degrade mathematical reasoning performance by suppressing uncertainty expression, particularly affecting out-of-distribution tasks. AI-generated summary ...

🔹 Publication Date: Published on Mar 25

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.24472
• PDF: https://arxiv.org/pdf/2603.24472
• Project Page: https://beanie00.notion.site/why-does-self-distillation-degrade-reasoning
• Github: https://github.com/beanie00/self-distillation-analysis

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research