ML Research Hub

💾 LLM Architecture Cheat Sheet: from GPT-2 to Trillion-scale Models

LLM Architecture Gallery — a page with cards for 39 models (2019–2026): DeepSeek, Qwen, Llama, Kimi, Grok, Nemotron, and others. For each — an architecture diagram, decoder type (dense / sparse MoE / hybrid), attention type, and links to technical reports and configs from HuggingFace.

It's clear how the market has converged on MoE + MLA for large models and why hybrid architectures (Mamba-2, DeltaNet, Lightning Attention) are gaining momentum.

🔘 Open Gallery
https://sebastianraschka.com/llm-architecture-gallery/

https://t.iss.one/DataScienceT

🔴

Please open Telegram to view this post

VIEW IN TELEGRAM

❤3

743 viewsedited 06:08

ML Research Hub

⚠️ ¿Te estás perdiendo la mayor transferencia de riqueza de esta década?
Seamos honestos... Mientras la mayoría de personas están entreteniéndose preguntándole tonterías a ChatGPT, una pequeña minoría silenciosa ya está facturando miles de euros extra cada mes. Y no, no son programadores ni gurús.
Simplemente tienen la información correcta mucho antes que el resto.
La Inteligencia Artificial avanza a una velocidad que asusta. Cada mañana sale una nueva herramienta, una nueva actualización o una startup que destruye un nicho entero y crea 3 oportunidades de negocio millonarias nuevas.
¿El problema? Leer las 500 noticias aburridas en inglés sobre servidores y APIs para encontrar esa pepita de oro que realmente puedes usar para ganar dinero hoy... es imposible si tienes un trabajo y una vida.
Por eso he creado este rincón privado. 👇
He programado un investigador de Inteligencia Artificial que no duerme. Él lee y mastica automáticamente todas las aburridas publicaciones científicas de OpenAI, Silicon Valley y TechCrunch... y te envía directamente a tu móvil solamente lo que importa:
🔥 El Resumen: Qué herramienta acaba de salir al mercado.
💡 El Impacto: Por qué esto va a cambiar las reglas del juego.
💎 LA MASTERCLASS: Un Plan de Acción B2B paso a paso y "Masticado" sobre cómo puedes TÚ monetizar esa misma noticia esta misma tarde. Ni humo ni teoría, solo negocios aplicables.
Ya no hace falta que persigas la información; ahora las oportunidades de negocio caen directamente en la palma de tu mano mientras te tomas el café. ☕
👉 Toca aquí para entrar gratis al Canal donde ocurre la magia antes de que cierre las puertas:
🔗 https://t.iss.one/iamonetizacion
P.D. Dentro del canal encontrarás el acceso a nuestro Club VIP Cerrado, donde literalmente te entrego las Masterclass de ingeniería de negocio que la IA me genera en exclusiva. Entra y compruébalo tú mismo. 🚀

IA y Monetización

Actualidad sobre IA y modos de generar ingresos

❤1👏1

173 views06:17

ML Research Hub

ML Research Hub pinned «⚠️ ¿Te estás perdiendo la mayor transferencia de riqueza de esta década? Seamos honestos... Mientras la mayoría de personas están entreteniéndose preguntándole tonterías a ChatGPT, una pequeña minoría silenciosa ya está facturando miles de euros extra cada…»

06:17

ML Research Hub

✨SpatialBoost: Enhancing Visual Representation through Language-Guided Reasoning

📝 Summary:
SpatialBoost improves the 3D spatial awareness of vision encoders by integrating linguistic 3D spatial knowledge. It achieves this through a multi-turn Chain-of-Thought reasoning process using Large Language Models, converting 3D spatial information from 2D images into linguistic descriptions. Th...

🔹 Publication Date: Published on Mar 23

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.22057
• PDF: https://arxiv.org/pdf/2603.22057
• Project Page: https://rootyjeon.github.io/spatial-boost/
• Github: https://github.com/rootyJeon/SpatialBoost

==================================

For more data science resources:
✓ https://t.iss.one/DataScienceT

#SpatialBoost #ComputerVision #LLM #3DVision #AI

111 views08:08

✨ Explore Data Science 📝 Write your paper

ML Research Hub

✨Demystifying Reinforcement Learning for Long-Horizon Tool-Using Agents: A Comprehensive Recipe

📝 Summary:
This paper presents a comprehensive recipe for applying reinforcement learning to long-horizon tool-using LLMs. It systematically studies 5 design axes, offering key takeaways such as scale-dependent rewards and optimal data composition. The distilled recipe enables state-of-the-art performance o...

🔹 Publication Date: Published on Mar 23

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.21972
• PDF: https://arxiv.org/pdf/2603.21972
• Github: https://github.com/WxxShirley/Agent-STAR

==================================

For more data science resources:
✓ https://t.iss.one/DataScienceT

#ReinforcementLearning #LLMs #AI #ToolUsingAgents #MachineLearning

104 views08:08

✨ Explore Data Science 📝 Write your paper

ML Research Hub

0:14

This media is not supported in your browser

VIEW IN TELEGRAM

✨Generalized Discrete Diffusion from Snapshots

📝 Summary:
GDDS presents a unified framework for discrete diffusion modeling with flexible noising processes. It achieves superior training efficiency and generation quality, outperforming existing discrete methods and autoregressive models in large-vocabulary tasks.

🔹 Publication Date: Published on Mar 22

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.21342
• PDF: https://arxiv.org/pdf/2603.21342
• Project Page: https://oussamazekri.fr/gdds
• Github: https://github.com/ozekri/gdds

==================================

For more data science resources:
✓ https://t.iss.one/DataScienceT

#DiffusionModels #GenerativeAI #MachineLearning #DeepLearning #AIResearch

127 views08:08

✨ Explore Data Science 📝 Write your paper

ML Research Hub

✨RoboAlign: Learning Test-Time Reasoning for Language-Action Alignment in Vision-Language-Action Models

📝 Summary:
RoboAlign is a training framework that improves embodied reasoning in vision-language-action models. It combines zero-shot natural language reasoning with reinforcement learning to boost action accuracy and bridge the language-action gap, yielding significant performance gains.

🔹 Publication Date: Published on Mar 22

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.21341
• PDF: https://arxiv.org/pdf/2603.21341

==================================

For more data science resources:
✓ https://t.iss.one/DataScienceT

#RoboAlign #EmbodiedAI #ReinforcementLearning #VLA #AIResearch

102 views09:08

✨ Explore Data Science 📝 Write your paper

ML Research Hub

✨REVERE: Reflective Evolving Research Engineer for Scientific Workflows

📝 Summary:
REVERE enhances research coding agent performance via reflective optimization and cumulative knowledge consolidation across multiple tasks. It overcomes prior prompt-optimization limits, achieving significant gains on research coding benchmarks and demonstrating agent evolution.

🔹 Publication Date: Published on Mar 21

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.20667
• PDF: https://arxiv.org/pdf/2603.20667

==================================

For more data science resources:
✓ https://t.iss.one/DataScienceT

#AIAgents #ResearchAutomation #CodingAI #PromptEngineering #AgentEvolution

121 views09:09

✨ Explore Data Science 📝 Write your paper

ML Research Hub

✨The Universal Normal Embedding

📝 Summary:
Generative models and vision encoders share a common Gaussian latent space called the Universal Normal Embedding UNE. This shared UNE provides aligned semantic representations and enables controllable image editing through simple linear manipulations.

🔹 Publication Date: Published on Mar 23

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.21786
• PDF: https://arxiv.org/pdf/2603.21786

==================================

For more data science resources:
✓ https://t.iss.one/DataScienceT

#GenerativeAI #ComputerVision #LatentSpace #DeepLearning #MachineLearning

❤1

151 views09:09

✨ Explore Data Science 📝 Write your paper

ML Research Hub

✨F4Splat: Feed-Forward Predictive Densification for Feed-Forward 3D Gaussian Splatting

📝 Summary:
F4Splat introduces predictive densification for 3D Gaussian splatting, adaptively allocating Gaussians based on spatial complexity and view overlap. This reduces redundant Gaussians, leading to compact, high-quality 3D representations with significantly fewer Gaussians than prior feed-forward met...

🔹 Publication Date: Published on Mar 22

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.21304
• PDF: https://arxiv.org/pdf/2603.21304
• Project Page: https://mlvlab.github.io/F4Splat/
• Github: https://github.com/mlvlab/F4Splat

==================================

For more data science resources:
✓ https://t.iss.one/DataScienceT

#3DGaussianSplatting #ComputerGraphics #3DReconstruction #MachineLearning #NeuralRendering

94 views10:09

✨ Explore Data Science 📝 Write your paper

ML Research Hub

✨BubbleRAG: Evidence-Driven Retrieval-Augmented Generation for Black-Box Knowledge Graphs

📝 Summary:
BubbleRAG improves graph-based RAG recall and precision for black-box knowledge graphs. It uses semantic anchoring and bubble expansion to find relevant subgraphs, achieving state-of-the-art results on multi-hop QA.

🔹 Publication Date: Published on Mar 19

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.20309
• PDF: https://arxiv.org/pdf/2603.20309
• Github: https://github.com/limafang/BubbleRAG

==================================

For more data science resources:
✓ https://t.iss.one/DataScienceT

#RAG #KnowledgeGraphs #AI #NLP #MachineLearning

98 views10:10

✨ Explore Data Science 📝 Write your paper

ML Research Hub

✨SEM: Sparse Embedding Modulation for Post-Hoc Debiasing of Vision-Language Models

📝 Summary:
Sparse Embedding Modulation SEM debiases vision-language models by operating in a sparse autoencoder latent space. SEM precisely modulates bias-relevant neurons while preserving semantic information, achieving substantial fairness gains in retrieval and classification tasks.

🔹 Publication Date: Published on Mar 19

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.19028
• PDF: https://arxiv.org/pdf/2603.19028
• Project Page: https://sparse-embedding-modulation.github.io/
• Github: https://github.com/mardgui/SEM

==================================

For more data science resources:
✓ https://t.iss.one/DataScienceT

#VisionLanguageModels #BiasCorrection #MachineLearning #AIResearch #DeepLearning

124 views10:10

✨ Explore Data Science 📝 Write your paper

ML Research Hub

✨SNAP: Speaker Nulling for Artifact Projection in Speech Deepfake Detection

📝 Summary:
A speaker-nulling framework called SNAP is proposed to reduce speaker entanglement in speech encoders, enabling detectors to focus on artifact-related patterns for improved deepfake detection performa...

🔹 Publication Date: Published on Mar 21

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.20686
• PDF: https://arxiv.org/pdf/2603.20686
• Project Page: https://huggingface.co/papers?q=orthogonal%20projection

==================================

For more data science resources:
✓ https://t.iss.one/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research

116 views10:10

✨ Explore Data Science 📝 Write your paper

ML Research Hub

✨Not All Layers Are Created Equal: Adaptive LoRA Ranks for Personalized Image Generation

📝 Summary:
LoRA² adapts layer-specific ranks during fine-tuning for personalized image generation, achieving better performance-memory trade-offs than fixed-rank approaches. AI-generated summary Low Rank Adaptat...

🔹 Publication Date: Published on Mar 23

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.21884
• PDF: https://arxiv.org/pdf/2603.21884
• Project Page: https://donaldssh.github.io/NotAllLayersAreCreatedEqual/
• Github: https://github.com/donaldssh/NotAllLayersAreCreatedEqual

==================================

For more data science resources:
✓ https://t.iss.one/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research

164 views10:10

✨ Explore Data Science 📝 Write your paper

ML Research Hub

✨Repurposing Geometric Foundation Models for Multi-view Diffusion

📝 Summary:
Geometric Latent Diffusion (GLD) framework utilizes geometric foundation models' feature space as latent space for novel view synthesis, achieving superior 2D and 3D performance while reducing trainin...

🔹 Publication Date: Published on Mar 23

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.22275
• PDF: https://arxiv.org/pdf/2603.22275
• Project Page: https://cvlab-kaist.github.io/GLD/
• Github: https://github.com/cvlab-kaist/GLD

==================================

For more data science resources:
✓ https://t.iss.one/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research

130 views11:10

✨ Explore Data Science 📝 Write your paper

ML Research Hub

✨OpenResearcher: A Fully Open Pipeline for Long-Horizon Deep Research Trajectory Synthesis

📝 Summary:
OpenResearcher presents a reproducible pipeline for training deep research agents using offline search environments and synthesized trajectories, achieving improved accuracy on benchmark tasks. AI-gen...

🔹 Publication Date: Published on Mar 17

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.20278
• PDF: https://arxiv.org/pdf/2603.20278
• Project Page: https://github.com/TIGER-AI-Lab/OpenResearcher
• Github: https://github.com/TIGER-AI-Lab/OpenResearcher

==================================

For more data science resources:
✓ https://t.iss.one/DataScienceT

#AI #DeepLearning #ResearchAutomation #Reproducibility #OpenScience

152 views11:11

✨ Explore Data Science 📝 Write your paper

ML Research Hub

✨FluidWorld: Reaction-Diffusion Dynamics as a Predictive Substrate for World Models

📝 Summary:
FluidWorld demonstrates that partial differential equations can serve as an efficient alternative to attention mechanisms and convolutional recurrent networks in world modeling, achieving better spati...

🔹 Publication Date: Published on Mar 22

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.21315
• PDF: https://arxiv.org/pdf/2603.21315
• Project Page: https://infinition.github.io/FluidWorld
• Github: https://github.com/infinition/FluidWorld

==================================

For more data science resources:
✓ https://t.iss.one/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research

178 views11:11

✨ Explore Data Science 📝 Write your paper

ML Research Hub

✨Look Where It Matters: High-Resolution Crops Retrieval for Efficient VLMs

📝 Summary:
AwaRes is a spatial-on-demand framework for VLMs that resolves the accuracy-efficiency trade-off. It operates on a low-resolution global view and uses tool-calling to dynamically retrieve high-resolution segments as needed. Training involves multi-turn reinforcement learning with composite rewards.

🔹 Publication Date: Published on Mar 14

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.16932
• PDF: https://arxiv.org/pdf/2603.16932
• Project Page: https://nimrodshabtay.github.io/AwaRes/
• Github: https://github.com/NimrodShabtay/AwaRes

✨ Datasets citing this paper:
• https://huggingface.co/datasets/NimrodShabtay1986/AwaRes

==================================

For more data science resources:
✓ https://t.iss.one/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research

190 views12:11

✨ Explore Data Science 📝 Write your paper

ML Research Hub

✨Safe Flow Q-Learning: Offline Safe Reinforcement Learning with Reachability-Based Flow Policies

📝 Summary:
SafeFlow Q-Learning extends FQL to safe offline reinforcement learning by combining a Hamilton-Jacobi reachability-inspired safety value function with an efficient one-step flow policy, achieving lowe...

🔹 Publication Date: Published on Mar 16

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.15136
• PDF: https://arxiv.org/pdf/2603.15136
• Project Page: https://tau-intelligence.com/safe-fql/
• Github: https://github.com/tau-intelligence/safe-fql

==================================

For more data science resources:
✓ https://t.iss.one/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research

❤1

237 views13:11

✨ Explore Data Science 📝 Write your paper

ML Research Hub

✨AdditiveLLM2: A Multi-modal Large Language Model for Additive Manufacturing

📝 Summary:
AdditiveLLM2 is a multi-modal LLM built on Gemma 3, specialized for additive manufacturing via domain-adaptive pretraining and instruction tuning on a small dataset. It achieves over 90 percent accuracy in AM language and vision tasks, proving an accessible specialization method for domain-specif...

🔹 Publication Date: Published on Mar 23

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.22017
• PDF: https://arxiv.org/pdf/2603.22017

==================================

For more data science resources:
✓ https://t.iss.one/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research

209 views16:11

✨ Explore Data Science 📝 Write your paper

ML Research Hub

✨Progressive Training for Explainable Citation-Grounded Dialogue: Reducing Hallucination to Zero in English-Hindi LLMs

📝 Summary:
XKD-Dial is a progressive training pipeline for explainable, bilingual English-Hindi knowledge-grounded dialogue. It achieves zero hallucination rates by using citation grounding and improves explainability through post-hoc analyses.

🔹 Publication Date: Published on Mar 19

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.18911
• PDF: https://arxiv.org/pdf/2603.18911

==================================

For more data science resources:
✓ https://t.iss.one/DataScienceT

#LLMs #ExplainableAI #NaturalLanguageProcessing #AIResearch #HallucinationReduction

221 views16:12

✨ Explore Data Science 📝 Write your paper

About

Blog

Apps

Platform