ML Research Hub
32.5K subscribers
6.1K photos
398 videos
24 files
6.6K links
Advancing research in Machine Learning – practical insights, tools, and techniques for researchers.

Admin: @HusseinSheikho || @Hussein_Sheikho
Download Telegram
Perceptio: Perception Enhanced Vision Language Models via Spatial Token Generation

📝 Summary:
Perceptio enhances vision-language models with explicit spatial reasoning through integrated semantic segmentation and depth tokens generated via VQ-VAE distillation and multi-task learning. AI-genera...

🔹 Publication Date: Published on Mar 19

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.18795
• PDF: https://arxiv.org/pdf/2603.18795

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research
AnimalCLAP: Taxonomy-Aware Language-Audio Pretraining for Species Recognition and Trait Inference

📝 Summary:
AnimalCLAP is a taxonomy-aware language-audio framework that uses hierarchical biological information to improve species classification from vocalizations, achieving better performance than CLAP by le...

🔹 Publication Date: Published on Mar 23

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.22053
• PDF: https://arxiv.org/pdf/2603.22053
• Project Page: https://dahlian00.github.io/AnimalCLAP_Page/

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research
Effective Strategies for Asynchronous Software Engineering Agents

📝 Summary:
Multi-agent collaboration for software engineering tasks faces challenges in coordination and synchronization, which are addressed through a structured paradigm using centralized delegation, asynchron...

🔹 Publication Date: Published on Mar 23

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.21489
• PDF: https://arxiv.org/pdf/2603.21489
• Github: https://github.com/JiayiGeng/CAID

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research
Agentic AI and the next intelligence explosion

📝 Summary:
T h e " A I s i n g u l a r i t y " i s o f t e n m i s c a s t a s a m o n o l i t h i c , g o d l i k e m i n d . E v o l u t i o n s u g g e s t s a d i f f e r e n t p a t h : i n t e l l i g e n ...

🔹 Publication Date: Published on Mar 21

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.20639
• PDF: https://arxiv.org/pdf/2603.20639

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research
Understanding Behavior Cloning with Action Quantization

📝 Summary:
Behavior cloning with quantized actions in autoregressive models achieves optimal sample complexity under stability and smoothness conditions, with quantization error affecting horizon-dependent perfo...

🔹 Publication Date: Published on Mar 20

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.20538
• PDF: https://arxiv.org/pdf/2603.20538

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research
Scaling DoRA: High-Rank Adaptation via Factored Norms and Fused Kernels

📝 Summary:
High-rank DoRA is improved by addressing its memory and speed limitations. The paper introduces a factored norm decomposition and fused Triton kernels. This makes DoRA faster for inference and training, reduces memory usage, and maintains high accuracy across vision-language models.

🔹 Publication Date: Published on Mar 23

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.22276
• PDF: https://arxiv.org/pdf/2603.22276
• Github: https://github.com/sockeye44/dorafactors

Datasets citing this paper:
https://huggingface.co/datasets/eyes-ml/MMFineReason-SFT-123K-Qwen3-VL-235B-Thinking-QR-max4096

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research
Media is too big
VIEW IN TELEGRAM
Group3D: MLLM-Driven Semantic Grouping for Open-Vocabulary 3D Object Detection

📝 Summary:
Group3D is a multi-view open-vocabulary 3D detection framework that integrates semantic constraints into instance construction through semantic compatibility groups, improving accuracy in pose-known a...

🔹 Publication Date: Published on Mar 23

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.21944
• PDF: https://arxiv.org/pdf/2603.21944
• Project Page: https://ubin108.github.io/Group3D/
• Github: https://github.com/Ubin108/Group3D

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research
Insight-V++: Towards Advanced Long-Chain Visual Reasoning with Multimodal Large Language Models

📝 Summary:
A multi-agent visual reasoning framework advances MLLM capabilities through scalable data generation and iterative self-improvement, enhancing both image and video reasoning while maintaining perceptu...

🔹 Publication Date: Published on Mar 18

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.18118
• PDF: https://arxiv.org/pdf/2603.18118

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research
In-the-Wild Camouflage Attack on Vehicle Detectors through Controllable Image Editing

📝 Summary:
A novel framework formulates vehicle camouflage attacks as a conditional image-editing problem using ControlNet to generate stealthy adversarial examples with preserved structure and enhanced transfer...

🔹 Publication Date: Published on Mar 19

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.19456
• PDF: https://arxiv.org/pdf/2603.19456
• Project Page: https://humansensinglab.github.io/CtrlCamo/
• Github: https://github.com/humansensinglab/CtrlCamo

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research
Semantic Audio-Visual Navigation in Continuous Environments

📝 Summary:
MAGNet, a multimodal transformer-based model, enables embodied agents to navigate audio-visual environments by jointly encoding spatial and semantic goal representations while incorporating historical...

🔹 Publication Date: Published on Mar 20

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.19660
• PDF: https://arxiv.org/pdf/2603.19660
• Github: https://github.com/yichenzeng24/SAVN-CE

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research
This media is not supported in your browser
VIEW IN TELEGRAM
💾 LLM Architecture Cheat Sheet: from GPT-2 to Trillion-scale Models

LLM Architecture Gallery — a page with cards for 39 models (2019–2026): DeepSeek, Qwen, Llama, Kimi, Grok, Nemotron, and others. For each — an architecture diagram, decoder type (dense / sparse MoE / hybrid), attention type, and links to technical reports and configs from HuggingFace.

It's clear how the market has converged on MoE + MLA for large models and why hybrid architectures (Mamba-2, DeltaNet, Lightning Attention) are gaining momentum.

🔘 Open Gallery
https://sebastianraschka.com/llm-architecture-gallery/

https://t.iss.one/DataScienceT 🔴
Please open Telegram to view this post
VIEW IN TELEGRAM
3
⚠️ ¿Te estás perdiendo la mayor transferencia de riqueza de esta década?
Seamos honestos... Mientras la mayoría de personas están entreteniéndose preguntándole tonterías a ChatGPT, una pequeña minoría silenciosa ya está facturando miles de euros extra cada mes. Y no, no son programadores ni gurús.
Simplemente tienen la información correcta mucho antes que el resto.
La Inteligencia Artificial avanza a una velocidad que asusta. Cada mañana sale una nueva herramienta, una nueva actualización o una startup que destruye un nicho entero y crea 3 oportunidades de negocio millonarias nuevas.
¿El problema? Leer las 500 noticias aburridas en inglés sobre servidores y APIs para encontrar esa pepita de oro que realmente puedes usar para ganar dinero hoy... es imposible si tienes un trabajo y una vida.
Por eso he creado este rincón privado. 👇
He programado un investigador de Inteligencia Artificial que no duerme. Él lee y mastica automáticamente todas las aburridas publicaciones científicas de OpenAI, Silicon Valley y TechCrunch... y te envía directamente a tu móvil solamente lo que importa:
🔥 El Resumen: Qué herramienta acaba de salir al mercado.
💡 El Impacto: Por qué esto va a cambiar las reglas del juego.
💎 LA MASTERCLASS: Un Plan de Acción B2B paso a paso y "Masticado" sobre cómo puedes TÚ monetizar esa misma noticia esta misma tarde. Ni humo ni teoría, solo negocios aplicables.
Ya no hace falta que persigas la información; ahora las oportunidades de negocio caen directamente en la palma de tu mano mientras te tomas el café.
👉 Toca aquí para entrar gratis al Canal donde ocurre la magia antes de que cierre las puertas:
🔗 https://t.iss.one/iamonetizacion
P.D. Dentro del canal encontrarás el acceso a nuestro Club VIP Cerrado, donde literalmente te entrego las Masterclass de ingeniería de negocio que la IA me genera en exclusiva. Entra y compruébalo tú mismo. 🚀
1👏1
ML Research Hub pinned «⚠️ ¿Te estás perdiendo la mayor transferencia de riqueza de esta década? Seamos honestos... Mientras la mayoría de personas están entreteniéndose preguntándole tonterías a ChatGPT, una pequeña minoría silenciosa ya está facturando miles de euros extra cada…»
SpatialBoost: Enhancing Visual Representation through Language-Guided Reasoning

📝 Summary:
SpatialBoost improves the 3D spatial awareness of vision encoders by integrating linguistic 3D spatial knowledge. It achieves this through a multi-turn Chain-of-Thought reasoning process using Large Language Models, converting 3D spatial information from 2D images into linguistic descriptions. Th...

🔹 Publication Date: Published on Mar 23

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.22057
• PDF: https://arxiv.org/pdf/2603.22057
• Project Page: https://rootyjeon.github.io/spatial-boost/
• Github: https://github.com/rootyJeon/SpatialBoost

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#SpatialBoost #ComputerVision #LLM #3DVision #AI
Demystifying Reinforcement Learning for Long-Horizon Tool-Using Agents: A Comprehensive Recipe

📝 Summary:
This paper presents a comprehensive recipe for applying reinforcement learning to long-horizon tool-using LLMs. It systematically studies 5 design axes, offering key takeaways such as scale-dependent rewards and optimal data composition. The distilled recipe enables state-of-the-art performance o...

🔹 Publication Date: Published on Mar 23

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.21972
• PDF: https://arxiv.org/pdf/2603.21972
• Github: https://github.com/WxxShirley/Agent-STAR

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#ReinforcementLearning #LLMs #AI #ToolUsingAgents #MachineLearning
This media is not supported in your browser
VIEW IN TELEGRAM
Generalized Discrete Diffusion from Snapshots

📝 Summary:
GDDS presents a unified framework for discrete diffusion modeling with flexible noising processes. It achieves superior training efficiency and generation quality, outperforming existing discrete methods and autoregressive models in large-vocabulary tasks.

🔹 Publication Date: Published on Mar 22

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.21342
• PDF: https://arxiv.org/pdf/2603.21342
• Project Page: https://oussamazekri.fr/gdds
• Github: https://github.com/ozekri/gdds

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#DiffusionModels #GenerativeAI #MachineLearning #DeepLearning #AIResearch
RoboAlign: Learning Test-Time Reasoning for Language-Action Alignment in Vision-Language-Action Models

📝 Summary:
RoboAlign is a training framework that improves embodied reasoning in vision-language-action models. It combines zero-shot natural language reasoning with reinforcement learning to boost action accuracy and bridge the language-action gap, yielding significant performance gains.

🔹 Publication Date: Published on Mar 22

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.21341
• PDF: https://arxiv.org/pdf/2603.21341

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#RoboAlign #EmbodiedAI #ReinforcementLearning #VLA #AIResearch
REVERE: Reflective Evolving Research Engineer for Scientific Workflows

📝 Summary:
REVERE enhances research coding agent performance via reflective optimization and cumulative knowledge consolidation across multiple tasks. It overcomes prior prompt-optimization limits, achieving significant gains on research coding benchmarks and demonstrating agent evolution.

🔹 Publication Date: Published on Mar 21

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.20667
• PDF: https://arxiv.org/pdf/2603.20667

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#AIAgents #ResearchAutomation #CodingAI #PromptEngineering #AgentEvolution
The Universal Normal Embedding

📝 Summary:
Generative models and vision encoders share a common Gaussian latent space called the Universal Normal Embedding UNE. This shared UNE provides aligned semantic representations and enables controllable image editing through simple linear manipulations.

🔹 Publication Date: Published on Mar 23

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.21786
• PDF: https://arxiv.org/pdf/2603.21786

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#GenerativeAI #ComputerVision #LatentSpace #DeepLearning #MachineLearning
1
F4Splat: Feed-Forward Predictive Densification for Feed-Forward 3D Gaussian Splatting

📝 Summary:
F4Splat introduces predictive densification for 3D Gaussian splatting, adaptively allocating Gaussians based on spatial complexity and view overlap. This reduces redundant Gaussians, leading to compact, high-quality 3D representations with significantly fewer Gaussians than prior feed-forward met...

🔹 Publication Date: Published on Mar 22

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.21304
• PDF: https://arxiv.org/pdf/2603.21304
• Project Page: https://mlvlab.github.io/F4Splat/
• Github: https://github.com/mlvlab/F4Splat

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#3DGaussianSplatting #ComputerGraphics #3DReconstruction #MachineLearning #NeuralRendering
BubbleRAG: Evidence-Driven Retrieval-Augmented Generation for Black-Box Knowledge Graphs

📝 Summary:
BubbleRAG improves graph-based RAG recall and precision for black-box knowledge graphs. It uses semantic anchoring and bubble expansion to find relevant subgraphs, achieving state-of-the-art results on multi-hop QA.

🔹 Publication Date: Published on Mar 19

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.20309
• PDF: https://arxiv.org/pdf/2603.20309
• Github: https://github.com/limafang/BubbleRAG

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#RAG #KnowledgeGraphs #AI #NLP #MachineLearning