ML Research Hub
32.3K subscribers
6.47K photos
443 videos
24 files
7.03K links
Advancing research in Machine Learning – practical insights, tools, and techniques for researchers.

Admin: @HusseinSheikho || @Hussein_Sheikho
Download Telegram
The Latent Space: Foundation, Evolution, Mechanism, Ability, and Outlook

📝 Summary:
Latent space is emerging as a fundamental computational substrate for language-based models, offering advantages over explicit token-level approaches through continuous representation that mitigates l...

🔹 Publication Date: Published on Apr 2

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2604.02029
• PDF: https://arxiv.org/pdf/2604.02029
• Github: https://github.com/YU-deep/Awesome-Latent-Space

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research
SKILL0: In-Context Agentic Reinforcement Learning for Skill Internalization

📝 Summary:
SKILL0 enables LLM agents to internalize skills during training, allowing zero-shot autonomous behavior through a dynamic curriculum that reduces contextual overhead while improving task performance. ...

🔹 Publication Date: Published on Apr 2

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2604.02268
• PDF: https://arxiv.org/pdf/2604.02268
• Github: https://github.com/ZJU-REAL/SkillZero

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research
FlowSlider: Training-Free Continuous Image Editing via Fidelity-Steering Decomposition

📝 Summary:
FlowSlider enables continuous image editing with slider-style control by decomposing updates into fidelity and steering components within Rectified Flow, providing stable strength control without addi...

🔹 Publication Date: Published on Apr 2

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2604.02088
• PDF: https://arxiv.org/pdf/2604.02088
• Project Page: https://huggingface.co/spaces/dominoer/FlowSlider

Spaces citing this paper:
https://huggingface.co/spaces/dominoer/FlowSlider

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research
DataFlex: A Unified Framework for Data-Centric Dynamic Training of Large Language Models

📝 Summary:
DataFlex is a unified framework for dynamic data-centric training of large language models that supports sample selection, domain mixture adjustment, and sample reweighting while maintaining compatibi...

🔹 Publication Date: Published on Mar 27

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.26164
• PDF: https://arxiv.org/pdf/2603.26164
• Github: https://github.com/OpenDCAI/DataFlex

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research
Generative World Renderer

📝 Summary:
A large-scale dynamic dataset derived from AAA games is introduced to improve generative inverse and forward rendering, featuring high-resolution synchronized RGB and G-buffer data alongside a novel V...

🔹 Publication Date: Published on Apr 2

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2604.02329
• PDF: https://arxiv.org/pdf/2604.02329
• Project Page: https://alaya-studio.github.io/renderer
• Github: https://github.com/ShandaAI/AlayaRenderer

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research
Omni-SimpleMem: Autoresearch-Guided Discovery of Lifelong Multimodal Agent Memory

📝 Summary:
An autonomous research pipeline discovers Omni-SimpleMem, a unified multimodal memory framework that significantly improves lifelong AI agent performance through automated architectural modifications,...

🔹 Publication Date: Published on Apr 2

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2604.01007
• PDF: https://arxiv.org/pdf/2604.01007

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research
EgoSim: Egocentric World Simulator for Embodied Interaction Generation

📝 Summary:
W e i n t r o d u c e E g o S i m , a c l o s e d - l o o p e g o c e n t r i c w o r l d s i m u l a t o r t h a t g e n e r a t e s s p a t i a l l y c o n s i s t e n t i n t e r a c t i o n v i d ...

🔹 Publication Date: Published on Apr 1

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2604.01001
• PDF: https://arxiv.org/pdf/2604.01001
• Project Page: https://egosimulator.github.io/
• Github: https://egosimulator.github.io/

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research
UniDriveVLA: Unifying Understanding, Perception, and Action Planning for Autonomous Driving

📝 Summary:
UniDriveVLA is a unified vision-language-action model for autonomous driving that decouples spatial perception and semantic reasoning through a mixture-of-transformers architecture with expert coordin...

🔹 Publication Date: Published on Apr 2

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2604.02190
• PDF: https://arxiv.org/pdf/2604.02190
• Project Page: https://xiaomi-research.github.io/unidrivevla/
• Github: https://github.com/xiaomi-research/unidrivevla

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research
VideoZeroBench: Probing the Limits of Video MLLMs with Spatio-Temporal Evidence Verification

📝 Summary:
VideoZeroBench presents a comprehensive benchmark for long-video question answering with rigorous spatio-temporal evidence verification, revealing significant gaps in current models' grounded video un...

🔹 Publication Date: Published on Apr 2

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2604.01569
• PDF: https://arxiv.org/pdf/2604.01569
• Project Page: https://marinero4972.github.io/projects/VideoZeroBench
• Github: https://github.com/marinero4972/VideoZeroBench

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research
Gated Condition Injection without Multimodal Attention: Towards Controllable Linear-Attention Transformers

📝 Summary:
Controllable diffusion models using linear attention architectures enable secure on-device visual generation with improved multi-condition input handling and faster convergence. AI-generated summary R...

🔹 Publication Date: Published on Mar 29

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.27666
• PDF: https://arxiv.org/pdf/2603.27666

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research
LatentUM: Unleashing the Potential of Interleaved Cross-Modal Reasoning via a Latent-Space Unified Model

📝 Summary:
LatentUM is a unified model that represents all modalities in a shared semantic latent space, enabling efficient cross-modal reasoning and generation without pixel-space mediation. AI-generated summar...

🔹 Publication Date: Published on Apr 2

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2604.02097
• PDF: https://arxiv.org/pdf/2604.02097
• Github: https://github.com/SJTU-DENG-Lab/LatentUM

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research
1
Apriel-Reasoner: RL Post-Training for General-Purpose and Efficient Reasoning

📝 Summary:
Apriel-Reasoner, a 15B LLM, uses reproducible multi-domain RL post-training with novel sampling and length penalty to boost reasoning accuracy and efficiency. It achieves 30-50% shorter traces, outperforming its base model and matching peers at lower inference cost.

🔹 Publication Date: Published on Apr 2

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2604.02007
• PDF: https://arxiv.org/pdf/2604.02007

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research
LinguDistill: Recovering Linguistic Ability in Vision- Language Models via Selective Cross-Modal Distillation

📝 Summary:
LinguDistill enables recovery of linguistic capabilities in vision-language models through adapter-free distillation using frozen language models as teachers, achieving performance close to pre-adapta...

🔹 Publication Date: Published on Apr 1

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2604.00829
• PDF: https://arxiv.org/pdf/2604.00829

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#VisionLanguageModels #NLP #ModelDistillation #ArtificialIntelligence #MachineLearning
Woosh: A Sound Effects Foundation Model

📝 Summary:
Woosh is a sound effect foundation model featuring audio encoding/decoding, text-audio alignment, and text-to-audio/video-to-audio generation capabilities with distilled versions for efficient deploym...

🔹 Publication Date: Published on Apr 2

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2604.01929
• PDF: https://arxiv.org/pdf/2604.01929
• Project Page: https://sonyresearch.github.io/Woosh/
• Github: https://github.com/SonyResearch/Woosh

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research
Friends and Grandmothers in Silico: Localizing Entity Cells in Language Models

📝 Summary:
Entity-centric factual question answering involves localized MLP neurons that can be causally intervened to recover entity-consistent predictions, showing robustness to various linguistic variations b...

🔹 Publication Date: Published on Apr 1

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2604.01404
• PDF: https://arxiv.org/pdf/2604.01404
• Github: https://github.com/1tux/in-silico

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research
Automatic Image-Level Morphological Trait Annotation for Organismal Images

📝 Summary:
This paper presents a scalable method for automatically annotating morphological traits from biological images. It uses sparse autoencoders on foundation model features to identify meaningful parts, then applies vision-language prompting to generate trait descriptions. This approach creates large...

🔹 Publication Date: Published on Apr 2

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2604.01619
• PDF: https://arxiv.org/pdf/2604.01619
• Github: https://github.com/OSU-NLP-Group/sae-trait-annotation

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research
Executing as You Generate: Hiding Execution Latency in LLM Code Generation

📝 Summary:
Parallel execution paradigm for LLM-based coding agents reduces latency by executing code during generation rather than in sequential stages. AI-generated summary Current LLM-based coding agents follo...

🔹 Publication Date: Published on Apr 1

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2604.00491
• PDF: https://arxiv.org/pdf/2604.00491

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research
UniRecGen: Unifying Multi-View 3D Reconstruction and Generation

📝 Summary:
UniRecGen combines feed-forward reconstruction and diffusion-based generation in a shared canonical space to produce complete and consistent 3D models from sparse inputs through disentangled cooperati...

🔹 Publication Date: Published on Apr 1

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2604.01479
• PDF: https://arxiv.org/pdf/2604.01479
• Github: https://github.com/zsh523/UniRecGen

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research
T5Gemma-TTS Technical Report

📝 Summary:
T5Gemma-TTS is an encoder-decoder codec language model that improves voice cloning and duration control for multilingual speech synthesis. It uses cross-attention for persistent text conditioning and Progress-Monitoring Rotary Position Embedding PM-RoPE for better target speech length tracking. I...

🔹 Publication Date: Published on Apr 2

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2604.01760
• PDF: https://arxiv.org/pdf/2604.01760
• Github: https://github.com/Aratako/T5Gemma-TTS

🔹 Models citing this paper:
https://huggingface.co/Aratako/T5Gemma-TTS-2b-2b

Spaces citing this paper:
https://huggingface.co/spaces/Aratako/T5Gemma-TTS-Demo
https://huggingface.co/spaces/litagin/T5Gemma-TTS-Demo

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#SpeechSynthesis #TTS #VoiceCloning #Multilingual #LanguageModels
Media is too big
VIEW IN TELEGRAM
DynaVid: Learning to Generate Highly Dynamic Videos using Synthetic Motion Data

📝 Summary:
DynaVid improves dynamic video synthesis by training with synthetic optical flow, which provides diverse motion patterns without artificial appearances. A two-stage framework learns dynamic motion while preserving visual realism, enhancing motion control.

🔹 Publication Date: Published on Apr 2

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2604.01666
• PDF: https://arxiv.org/pdf/2604.01666
• Project Page: https://jinwonjoon.github.io/DynaVid/

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#VideoGeneration #AIVideo #DeepLearning #ComputerVision #SyntheticData
This media is not supported in your browser
VIEW IN TELEGRAM
VOID: Video Object and Interaction Deletion

📝 Summary:
VOID is a video object removal framework designed for complex scenarios involving significant object interactions. It uses vision-language and video diffusion models, leveraging causal reasoning to generate physically plausible counterfactual scenes. VOID better preserves consistent scene dynamic...

🔹 Publication Date: Published on Apr 2

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2604.02296
• PDF: https://arxiv.org/pdf/2604.02296
• Project Page: https://void-model.github.io/
• Github: https://github.com/Netflix/void-model

🔹 Models citing this paper:
https://huggingface.co/netflix/void-model

Spaces citing this paper:
https://huggingface.co/spaces/sam-motamed/VOID

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#VideoEditing #DiffusionModels #ComputerVision #GenerativeAI #DeepLearning