ML Research Hub
32.7K subscribers
5.64K photos
358 videos
24 files
6.09K links
Advancing research in Machine Learning – practical insights, tools, and techniques for researchers.

Admin: @HusseinSheikho || @Hussein_Sheikho
Download Telegram
DreamWorld: Unified World Modeling in Video Generation

📝 Summary:
DreamWorld introduces a unified framework for video generation that integrates multiple types of world knowledge through joint modeling of temporal dynamics, spatial geometry, and semantic consistency...

🔹 Publication Date: Published on Feb 28

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.00466
• PDF: https://arxiv.org/pdf/2603.00466
• Github: https://github.com/ABU121111/DreamWorld

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research
Towards Multimodal Lifelong Understanding: A Dataset and Agentic Baseline

📝 Summary:
MM-Lifelong dataset captures natural video sequences across multiple temporal scales to evaluate multimodal lifelong understanding, revealing limitations in current approaches and introducing a recurs...

🔹 Publication Date: Published on Mar 5

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.05484
• PDF: https://arxiv.org/pdf/2603.05484
• Project Page: https://huggingface.co/datasets/CG-Bench/MM-Lifelong
• Github: https://github.com/cg1177/Recursive-Multimodal-Agent

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research
On-Policy Self-Distillation for Reasoning Compression

📝 Summary:
OPSDC enables efficient reasoning model compression by having models distill concise behavior from their own outputs, achieving significant token reduction while maintaining accuracy. AI-generated sum...

🔹 Publication Date: Published on Mar 5

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.05433
• PDF: https://arxiv.org/pdf/2603.05433

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research
UltraDexGrasp: Learning Universal Dexterous Grasping for Bimanual Robots with Synthetic Data

📝 Summary:
A bimanual robotic grasping framework is presented that generates diverse grasp data through optimization and planning, enabling effective zero-shot sim-to-real transfer with high success rates on nov...

🔹 Publication Date: Published on Mar 5

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.05312
• PDF: https://arxiv.org/pdf/2603.05312

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research
Distribution-Conditioned Transport

📝 Summary:
Distribution-conditioned transport framework enables generalization to unseen distribution pairs and supports semi-supervised learning for scientific applications. AI-generated summary Learning a tran...

🔹 Publication Date: Published on Mar 5

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.04736
• PDF: https://arxiv.org/pdf/2603.04736

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research
SageBwd: A Trainable Low-bit Attention

📝 Summary:
Research investigates why low-bit attention methods like SageBwd exhibit performance gaps during pre-training and identifies key factors for stable training with reduced precision. AI-generated summar...

🔹 Publication Date: Published on Mar 2

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.02170
• PDF: https://arxiv.org/pdf/2603.02170
• Project Page: https://github.com/thu-ml/SageAttention

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research
AgentVista: Evaluating Multimodal Agents in Ultra-Challenging Realistic Visual Scenarios

📝 Summary:
AgentVista presents a comprehensive benchmark for multimodal agents requiring long-horizon tool interactions across multiple modalities and complex real-world scenarios. AI-generated summary Real-worl...

🔹 Publication Date: Published on Feb 26

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2602.23166
• PDF: https://arxiv.org/pdf/2602.23166

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research
RoboPocket: Improve Robot Policies Instantly with Your Phone

📝 Summary:
RoboPocket enables efficient, robot-free policy iteration via smartphones. It uses augmented reality to visualize policy weaknesses, guiding data collection, and asynchronous online finetuning to update policies quickly. This doubles data and sample efficiency.

🔹 Publication Date: Published on Mar 5

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.05504
• PDF: https://arxiv.org/pdf/2603.05504
• Project Page: https://robo-pocket.github.io

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research
Large Multimodal Models as General In-Context Classifiers

📝 Summary:
Large Multimodal Models demonstrate superior performance in closed-world classification with in-context learning and excel in open-world scenarios when equipped with the proposed CIRCLE method for pse...

🔹 Publication Date: Published on Feb 26

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2602.23229
• PDF: https://arxiv.org/pdf/2602.23229
• Project Page: https://circle-lmm.github.io/
• Github: https://github.com/marco-garosi/CIRCLE

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research
GLiNER2: An Efficient Multi-Task Information Extraction System with Schema-Driven Interface

📝 Summary:
GLiNER2 is a unified transformer-based framework that supports multiple NLP tasks with improved efficiency and accessibility compared to large language models. AI-generated summary Information extract...

🔹 Publication Date: Published on Jul 24, 2025

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2507.18546
• PDF: https://arxiv.org/pdf/2507.18546
• Github: https://github.com/fastino-ai/GLiNER2

🔹 Models citing this paper:
https://huggingface.co/fastino/gliner2-base-v1
https://huggingface.co/fastino/gliner2-large-v1
https://huggingface.co/fastino/gliner2-multi-v1

Spaces citing this paper:
https://huggingface.co/spaces/sitammeur/GLiNER2-Suite
https://huggingface.co/spaces/fastino/gliner2-official-demo
https://huggingface.co/spaces/sohom004/testdup

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research
Mozi: Governed Autonomy for Drug Discovery LLM Agents

📝 Summary:
Mozi is a dual-layer framework for reliable drug discovery LLM agents, solving issues of tool-use governance and long-horizon reliability. It uses a control plane for isolated tool-use and replanning, plus a workflow plane for structured stages with human oversight, ensuring robust, auditable res...

🔹 Publication Date: Published on Mar 4

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.03655
• PDF: https://arxiv.org/pdf/2603.03655
• Project Page: https://ai4s.idea.edu.cn/ai4s/mozi

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#LLMAgents #DrugDiscovery #AI #MachineLearning #AutonomousAgents
MASQuant: Modality-Aware Smoothing Quantization for Multimodal Large Language Models

📝 Summary:
MASQuant improves multimodal LLM quantization by resolving smoothing misalignment and cross-modal invariance. It uses modality-aware smoothing and SVD whitening for cross-modal compensation, achieving stable, competitive performance.

🔹 Publication Date: Published on Mar 5

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.04800
• PDF: https://arxiv.org/pdf/2603.04800

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#MultimodalAI #LLM #Quantization #DeepLearning #AIResearch
SkillNet: Create, Evaluate, and Connect AI Skills

📝 Summary:
SkillNet is an open infrastructure that systematically creates, evaluates, and organizes AI skills using a unified ontology. This overcomes the lack of skill accumulation in current agents, significantly boosting performance by 40 percent in rewards and reducing execution steps by 30 percent.

🔹 Publication Date: Published on Feb 26

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.04448
• PDF: https://arxiv.org/pdf/2603.04448
• Project Page: https://skillnet.openkg.cn/
• Github: https://github.com/zjunlp/SkillNet

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#AI #AISkills #AIAgents #Ontology #MachineLearning
STMI: Segmentation-Guided Token Modulation with Cross-Modal Hypergraph Interaction for Multi-Modal Object Re-Identification

📝 Summary:
STMI is a novel multi-modal ReID framework that improves object re-identification. It uses segmentation-guided modulation for foreground enhancement, token reallocation for compact features, and cross-modal hypergraph interaction to capture high-order semantic relationships.

🔹 Publication Date: Published on Feb 28

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.00695
• PDF: https://arxiv.org/pdf/2603.00695

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#ObjectReID #ComputerVision #DeepLearning #MultiModalAI #STMI
Interactive Benchmarks

📝 Summary:
Interactive Benchmarks propose a new framework to assess AI intelligence by evaluating active information acquisition and reasoning under constraints. This approach offers a robust assessment, revealing significant room for model improvement in interactive scenarios.

🔹 Publication Date: Published on Mar 5

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.04737
• PDF: https://arxiv.org/pdf/2603.04737
• Project Page: https://github.com/interactivebench/interactivebench
• Github: https://github.com/interactivebench/interactivebench

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#AI #AIBenchmarks #InteractiveAI #AIIntelligence #MachineLearning
Lightweight Visual Reasoning for Socially-Aware Robots

📝 Summary:
A lightweight language-to-vision feedback module enhances VLMs for robotics. It reinterprets visual scenes under text context via a gated MLP, improving navigation, scene description, and human intention recognition with minimal parameters.

🔹 Publication Date: Published on Mar 4

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.03942
• PDF: https://arxiv.org/pdf/2603.03942
• Github: https://github.com/alessioGalatolo/VLM-Reasoning-for-Robotics

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#Robotics #VLMs #VisualReasoning #AI #HumanRobotInteraction
Media is too big
VIEW IN TELEGRAM
Latent Particle World Models: Self-supervised Object-centric Stochastic Dynamics Modeling

📝 Summary:
LPWM is a self-supervised object-centric world model that autonomously discovers object representations from video data. It models stochastic particle dynamics for decision-making, achieving state-of-the-art results.

🔹 Publication Date: Published on Mar 4

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.04553
• PDF: https://arxiv.org/pdf/2603.04553
• Project Page: https://taldatech.github.io/lpwm-web/
• Github: https://github.com/taldatech/lpwm

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#WorldModels #SelfSupervisedLearning #ObjectCentricAI #MachineLearning #AI
Truncated Step-Level Sampling with Process Rewards for Retrieval-Augmented Reasoning

📝 Summary:
SLATE uses truncated step-level sampling and dense LLM-as-judge rewards to train language models with search engines. This reduces gradient variance and improves performance on complex reasoning tasks by better assigning credit.

🔹 Publication Date: Published on Feb 26

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2602.23440
• PDF: https://arxiv.org/pdf/2602.23440
• Github: https://github.com/algoprog/SLATE

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#LLM #RetrievalAugmentedReasoning #ReinforcementLearning #AI #MachineLearning
Fast-FoundationStereo: Real-Time Zero-Shot Stereo Matching

📝 Summary:
Fast-FoundationStereo achieves real-time zero-shot stereo matching, bridging the gap between slow robust models and fast specialized ones. It employs distillation, architecture search, and pruning, running over 10x faster with similar accuracy to prior foundation models. This sets a new state-of-...

🔹 Publication Date: Published on Dec 11, 2025

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.11130
• PDF: https://arxiv.org/pdf/2512.11130
• Project Page: https://nvlabs.github.io/Fast-FoundationStereo/
• Github: https://github.com/NVlabs/Fast-FoundationStereo

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#StereoMatching #ComputerVision #RealTimeAI #ZeroShotLearning #DeepLearning
Please open Telegram to view this post
VIEW IN TELEGRAM