ML Research Hub
32.8K subscribers
4.4K photos
272 videos
23 files
4.76K links
Advancing research in Machine Learning – practical insights, tools, and techniques for researchers.

Admin: @HusseinSheikho || @Hussein_Sheikho
Download Telegram
MedSAM3: Delving into Segment Anything with Medical Concepts

📝 Summary:
MedSAM-3 is a text-promptable medical segmentation model fine-tuned on SAM 3 using semantic conceptual labels. It enables precise, open-vocabulary text-based segmentation of anatomical structures and integrates MLLMs for advanced reasoning. This approach significantly outperforms existing models ...

🔹 Publication Date: Published on Nov 24

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.19046
• PDF: https://arxiv.org/pdf/2511.19046
• Github: https://github.com/Joey-S-Liu/MedSAM3

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#MedicalAI #ImageSegmentation #DeepLearning #MLLMs #FoundationModels
Z-Image: An Efficient Image Generation Foundation Model with Single-Stream Diffusion Transformer

📝 Summary:
Z-Image is an efficient 6B-parameter diffusion transformer achieving state-of-the-art image generation with significantly reduced computational cost. It enables sub-second inference and consumer hardware compatibility, challenging the scale-at-all-costs paradigm.

🔹 Publication Date: Published on Nov 27

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.22699
• PDF: https://arxiv.org/pdf/2511.22699
• Project Page: https://tongyi-mai.github.io/Z-Image-blog/
• Github: https://github.com/Tongyi-MAI/Z-Image

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#ImageGeneration #DiffusionModels #EfficientAI #FoundationModels #MachineLearning
1
From Code Foundation Models to Agents and Applications: A Practical Guide to Code Intelligence

📝 Summary:
This paper provides a practical guide to code LLMs, covering their lifecycle from data to deployment. It examines techniques, analyzes various models, and discusses real-world challenges like correctness and security. Experiments on pre-training and fine-tuning are included.

🔹 Publication Date: Published on Nov 23

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.18538
• PDF: https://arxiv.org/pdf/2511.18538

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#CodeLLMs #AI #MachineLearning #SoftwareEngineering #FoundationModels
OmniFusion: Simultaneous Multilingual Multimodal Translations via Modular Fusion

📝 Summary:
OmniFusion is a multimodal translation system integrating pretrained foundation models with LLMs via a novel fusion strategy. It enables simultaneous multilingual translation using audio and visual inputs, reducing latency and improving quality over cascaded systems.

🔹 Publication Date: Published on Nov 28

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.00234
• PDF: https://arxiv.org/pdf/2512.00234
• Github: https://github.com/saikoneru/OmniFusion

🔹 Models citing this paper:
https://huggingface.co/skoneru/OmniFusion
https://huggingface.co/skoneru/OmniFusion_v2

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#MultimodalAI #LLMs #MachineTranslation #FoundationModels #AIResearch
👍1
LFM2 Technical Report

📝 Summary:
LFM2 is a family of compact foundation models designed for efficient on-device deployment. It uses hardware-in-the-loop architecture search and advanced training to achieve high performance across diverse tasks, including multimodal applications.

🔹 Publication Date: Published on Nov 28

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.23404
• PDF: https://arxiv.org/pdf/2511.23404

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#FoundationModels #EdgeAI #MultimodalAI #AIResearch #MachineLearning
Echo-4o: Harnessing the Power of GPT-4o Synthetic Images for Improved Image Generation

📝 Summary:
Echo-4o-Image is a 180K synthetic dataset from GPT-4o. It enhances image generation by covering rare scenarios and providing clean text to image supervision. This improves model performance and transferability across various foundation models.

🔹 Publication Date: Published on Aug 13

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2508.09987
• PDF: https://arxiv.org/pdf/2508.09987
• Project Page: https://yejy53.github.io/Echo-4o/
• Github: https://yejy53.github.io/Echo-4o

Datasets citing this paper:
https://huggingface.co/datasets/Yejy53/Echo-4o-Image

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#ImageGeneration #GPT4o #SyntheticData #AIResearch #FoundationModels
The SAM2-to-SAM3 Gap in the Segment Anything Model Family: Why Prompt-Based Expertise Fails in Concept-Driven Image Segmentation

📝 Summary:
This paper highlights the gap between SAM2 and SAM3. SAM2 uses spatial prompts for geometric segmentation, but SAM3 is a concept-driven multimodal model with a unified vision-language architecture. SAM3 represents a new class of foundation model for concept-driven segmentation.

🔹 Publication Date: Published on Dec 4

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.06032
• PDF: https://arxiv.org/pdf/2512.06032
• Github: https://github.com/Applied-AI-Research-Lab/The-SAM2-to-SAM3-Gap-in-the-Segment-Anything-Model-Family

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#ImageSegmentation #FoundationModels #ComputerVision #MultimodalAI #AIResearch
1
SAM Audio: Segment Anything in Audio

📝 Summary:
SAM Audio is a foundation model for general audio separation. It unifies text visual and temporal span prompts achieving state-of-the-art performance across diverse audio types. It also introduces a new real-world separation benchmark.

🔹 Publication Date: Published on Dec 19

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.18099
• PDF: https://arxiv.org/pdf/2512.18099
• Project Page: https://ai.meta.com/samaudio/
• Github: https://github.com/facebookresearch/sam-audio

🔹 Models citing this paper:
https://huggingface.co/facebook/sam-audio-large
https://huggingface.co/facebook/sam-audio-small
https://huggingface.co/facebook/sam-audio-base

Spaces citing this paper:
https://huggingface.co/spaces/lpeterl/sam-audio-webui
https://huggingface.co/spaces/Arrcttacsrks/SAM-Audio-Demo
https://huggingface.co/spaces/chippie1/SAM-Audio-Demo

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#AudioSeparation #FoundationModels #AI #DeepLearning #SAMAudio
A Comprehensive Survey of Self-Evolving AI Agents: A New Paradigm Bridging Foundation Models and Lifelong Agentic Systems

📝 Summary:
This survey reviews self-evolving AI agents that adapt to dynamic environments via automatic enhancement from interaction data. It proposes a unified framework and systematically reviews current techniques, addressing evaluation, safety, and ethics.

🔹 Publication Date: Published on Aug 10

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2508.07407
• PDF: https://arxiv.org/pdf/2508.07407
• Project Page: https://huggingface.co/spaces/X-iZhang/Awesome-Self-Evolving-Agents
• Github: https://github.com/EvoAgentX/Awesome-Self-Evolving-Agents

Spaces citing this paper:
https://huggingface.co/spaces/X-iZhang/Awesome-Self-Evolving-Agents

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#SelfEvolvingAI #AIAgents #FoundationModels #LifelongLearning #ArtificialIntelligence
1
Omni-Weather: Unified Multimodal Foundation Model for Weather Generation and Understanding

📝 Summary:
Omni-Weather is a new multimodal foundation model that unifies weather generation and understanding in a single architecture. It uses shared self-attention and a Chain-of-Thought dataset for interpretable, high-quality outputs, achieving state-of-the-art performance.

🔹 Publication Date: Published on Dec 25

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.21643
• PDF: https://arxiv.org/pdf/2512.21643

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#WeatherGeneration #FoundationModels #MultimodalAI #AIResearch #DeepLearning
1
This media is not supported in your browser
VIEW IN TELEGRAM
LTX-2: Efficient Joint Audio-Visual Foundation Model

📝 Summary:
LTX-2 is an open-source audiovisual diffusion model generating synchronized video and audio content. It uses a dual-stream transformer to achieve state-of-the-art quality, producing rich audio tracks efficiently.

🔹 Publication Date: Published on Jan 6

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2601.03233
• PDF: https://arxiv.org/pdf/2601.03233

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#AudiovisualAI #DiffusionModels #GenerativeAI #FoundationModels #VideoGeneration