ML Research Hub
32.4K subscribers
6.37K photos
429 videos
24 files
6.92K links
Advancing research in Machine Learning – practical insights, tools, and techniques for researchers.

Admin: @HusseinSheikho || @Hussein_Sheikho
Download Telegram
Distilling Conversations: Abstract Compression of Conversational Audio Context for LLM-based ASR

📝 Summary:
LLM-based ASR improves with multimodal conversational context, especially for entities. Raw audio context is costly, so Abstract Compression replaces prior-turn audio with fixed latent tokens, retaining transcripts. This reduces computational cost while recovering some performance gains.

🔹 Publication Date: Published on Mar 27

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.26246
• PDF: https://arxiv.org/pdf/2603.26246

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#LLM #ASR #SpeechRecognition #NLP #AI
It Takes Two: A Duet of Periodicity and Directionality for Burst Flicker Removal

📝 Summary:
Flicker artifacts in short-exposure photos are addressed by Flickerformer, a transformer-based architecture. It leverages flicker's intrinsic periodicity and directionality to effectively remove artifacts without introducing ghosting, outperforming existing methods.

🔹 Publication Date: Published on Mar 24

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.22794
• PDF: https://arxiv.org/pdf/2603.22794

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#ImageProcessing #DeepLearning #ComputerVision #Transformers #FlickerRemoval
Project Imaging-X: A Survey of 1000+ Open-Access Medical Imaging Datasets for Foundation Model Development

📝 Summary:
Medical imaging datasets are fragmented and small, limiting foundation model development. This survey of 1000+ open-access datasets proposes a metadata-driven fusion paradigm to integrate them, creating larger resources. This scales medical imaging data for more capable foundation models.

🔹 Publication Date: Published on Mar 29

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.27460
• PDF: https://arxiv.org/pdf/2603.27460
• Project Page: https://huggingface.co/datasets/General-Medical-AI/Project-Imaging-X
• Github: https://github.com/uni-medical/Project-Imaging-X

Datasets citing this paper:
https://huggingface.co/datasets/General-Medical-AI/Project-Imaging-X

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#MedicalImaging #FoundationModels #AI #DataScience #OpenData
1
Falcon Perception

📝 Summary:
Falcon Perception introduces a unified early-fusion Transformer that processes images and text within a single architecture from the first layer. This simplifies perception systems and achieves improved mask prediction and OCR performance, outperforming traditional modular designs.

🔹 Publication Date: Published on Mar 28

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.27365
• PDF: https://arxiv.org/pdf/2603.27365

Datasets citing this paper:
https://huggingface.co/datasets/tiiuae/PBench

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research
CREval: An Automated Interpretable Evaluation for Creative Image Manipulation under Complex Instructions

📝 Summary:
A fully automated question-answer based evaluation pipeline and comprehensive benchmark are introduced for assessing creative image manipulation tasks under complex instructions, demonstrating strong ...

🔹 Publication Date: Published on Mar 27

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.26174
• PDF: https://arxiv.org/pdf/2603.26174
• Github: https://github.com/ChonghuinanWang/CREval

Datasets citing this paper:
https://huggingface.co/datasets/ChonghuinanWang/CREval

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research
BizGenEval: A Systematic Benchmark for Commercial Visual Content Generation

📝 Summary:
A new benchmark called BizGenEval is introduced to evaluate image generation models on commercial visual content creation tasks across multiple document types and capability dimensions. AI-generated s...

🔹 Publication Date: Published on Mar 26

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.25732
• PDF: https://arxiv.org/pdf/2603.25732
• Project Page: https://microsoft.github.io/BizGenEval/
• Github: https://github.com/microsoft/BizGenEval

Datasets citing this paper:
https://huggingface.co/datasets/microsoft/BizGenEval

Spaces citing this paper:
https://huggingface.co/spaces/microsoft/BizGenEval-Leaderboard
https://huggingface.co/spaces/clarence-stark/BizGenEval-Leaderboard

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research
TrajectoryMover: Generative Movement of Object Trajectories in Videos

📝 Summary:
TrajectoryAtlas enables generative video editing by generating large-scale synthetic paired video data and training a video generator to move object 3D motion trajectories while preserving plausibilit...

🔹 Publication Date: Published on Mar 31

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.29092
• PDF: https://arxiv.org/pdf/2603.29092
• Project Page: https://chhatrekiran.github.io/trajectorymover/
• Github: https://github.com/kiranchhatre/TrajectoryMover

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research
1
Colon-Bench: An Agentic Workflow for Scalable Dense Lesion Annotation in Full-Procedure Colonoscopy Videos

📝 Summary:
Colon-Bench is a new comprehensive benchmark dataset for colonoscopy AI, created using an agentic workflow. It features 528 full-procedure videos with dense annotations for 14 lesion types, enabling MLLM evaluation and performance improvements.

🔹 Publication Date: Published on Mar 26

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.25645
• PDF: https://arxiv.org/pdf/2603.25645
• Project Page: https://abdullahamdi.com/colon-bench
• Github: https://github.com/ajhamdi/colon-bench-eval

Datasets citing this paper:
https://huggingface.co/datasets/ajhamdi/colon-bench

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research
1
Media is too big
VIEW IN TELEGRAM
WorldFlow3D: Flowing Through 3D Distributions for Unbounded World Generation

📝 Summary:
WorldFlow3D generates unbounded 3D worlds by modeling 3D data distributions as a flow matching problem. This latent-free approach achieves rapid convergence and high-quality generation with controllable geometric and texture properties. It outperforms existing methods on both real and synthetic s...

🔹 Publication Date: Published on Mar 31

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.29089
• PDF: https://arxiv.org/pdf/2603.29089
• Project Page: https://princeton-computational-imaging.github.io/WorldFlow3D/

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#3DGeneration #GenerativeAI #FlowMatching #ComputerGraphics #AIResearch
The Model Says Walk: How Surface Heuristics Override Implicit Constraints in LLM Reasoning

📝 Summary:
Large language models exhibit systematic reasoning failures when surface cues conflict with feasibility constraints, demonstrating consistent heuristic biases that can be measured and partially mitiga...

🔹 Publication Date: Published on Mar 30

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.29025
• PDF: https://arxiv.org/pdf/2603.29025

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research
When Documents Disagree: Measuring Institutional Variation in Transplant Guidance with Retrieval-Augmented Language Models

📝 Summary:
P a t i e n t e d u c a t i o n m a t e r i a l s f o r s o l i d - o r g a n t r a n s p l a n t a t i o n v a r y s u b s t a n t i a l l y a c r o s s U . S . c e n t e r s , y e t n o s y s t e m ...

🔹 Publication Date: Published on Mar 23

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.21460
• PDF: https://arxiv.org/pdf/2603.21460

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research
Tabular LLMs for Interpretable Few-Shot Alzheimer's Disease Prediction with Multimodal Biomedical Data

📝 Summary:
A domain-adapted tabular large language model framework demonstrates improved few-shot Alzheimer's disease classification performance over traditional methods while maintaining stability under missing...

🔹 Publication Date: Published on Mar 17

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.17191
• PDF: https://arxiv.org/pdf/2603.17191

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research
MPDiT: Multi-Patch Global-to-Local Transformer Architecture For Efficient Flow Matching and Diffusion Model

📝 Summary:
This paper introduces MPDiT, a multi-patch transformer for diffusion models. It processes larger patches in early layers for global context and smaller patches later for local details, reducing computation by up to fifty percent while maintaining generative performance.

🔹 Publication Date: Published on Mar 27

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.26357
• PDF: https://arxiv.org/pdf/2603.26357

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research
This media is not supported in your browser
VIEW IN TELEGRAM
TokenDial: Continuous Attribute Control in Text-to-Video via Spatiotemporal Token Offsets

📝 Summary:
TokenDial enables precise attribute control in text-to-video models by using additive offsets in spatiotemporal token space for coherent edits without retraining. AI-generated summary We present Token...

🔹 Publication Date: Published on Mar 29

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.27520
• PDF: https://arxiv.org/pdf/2603.27520
• Project Page: https://tokendial.github.io/
• Github: https://github.com/ariannaliu/TokenDial

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#TextToVideo #GenerativeAI #AIControl #VideoGeneration #DeepLearning
Media is too big
VIEW IN TELEGRAM
OmniRoam: World Wandering via Long-Horizon Panoramic Video Generation

📝 Summary:
OmniRoam generates long-horizon panoramic videos using a two-stage approach for improved scene completeness and consistency. It first previews a trajectory-controlled video, then refines and extends it to high-resolution, long-range panoramas, enabling high-fidelity world wandering.

🔹 Publication Date: Published on Mar 31

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.30045
• PDF: https://arxiv.org/pdf/2603.30045
• Project Page: https://yuheng.ink/project-page/omniroam/
• Github: https://github.com/yuhengliu02/OmniRoam

🔹 Models citing this paper:
https://huggingface.co/Yuheng02/OmniRoam

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research
Can a Single Model Master Both Multi-turn Conversations and Tool Use? CALM: A Unified Conversational Agentic Language Model

📝 Summary:
CALM is a unified model bridging the gap between multi-turn conversations and tool use in language agents. Trained on a new multi-task dataset CALM-IT, it integrates both capabilities. CALM outperforms specialized models, including GPT-4o, across various benchmarks.

🔹 Publication Date: Published on Feb 12, 2025

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2502.08820
• PDF: https://arxiv.org/pdf/2502.08820
• Project Page: https://emrecanacikgoz.github.io/CoALM/
• Github: https://github.com/oumi-ai/oumi

🔹 Models citing this paper:
https://huggingface.co/uiuc-convai/CoALM-8B
https://huggingface.co/uiuc-convai/CoALM-405B
https://huggingface.co/uiuc-convai/CoALM-70B

Datasets citing this paper:
https://huggingface.co/datasets/uiuc-convai/CoALM-IT

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research
Terminal Agents Suffice for Enterprise Automation

📝 Summary:
Simple terminal-based coding agents interacting directly with platform APIs, powered by foundation models, are highly effective for enterprise automation. These low-level agents match or outperform complex tool-augmented systems, demonstrating that elaborate agent architectures are often unnecess...

🔹 Publication Date: Published on Mar 31

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2604.00073
• PDF: https://arxiv.org/pdf/2604.00073

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research
Think, Act, Build: An Agentic Framework with Vision Language Models for Zero-Shot 3D Visual Grounding

📝 Summary:
A dynamic agentic framework called TAB addresses 3D visual grounding by decoupling spatial semantics resolution from 3D structure instantiation through 2D VLMs and multi-view geometry, achieving super...

🔹 Publication Date: Published on Apr 1

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2604.00528
• PDF: https://arxiv.org/pdf/2604.00528
• Github: https://github.com/WHB139426/TAB-Agent

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research
This media is not supported in your browser
VIEW IN TELEGRAM
GaussianGPT: Towards Autoregressive 3D Gaussian Scene Generation

📝 Summary:
GaussianGPT uses a transformer-based autoregressive approach with 3D rotary positional embeddings to generate 3D scenes by predicting Gaussian primitives, offering advantages over diffusion methods in...

🔹 Publication Date: Published on Mar 27

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.26661
• PDF: https://arxiv.org/pdf/2603.26661
• Project Page: https://nicolasvonluetzow.github.io/GaussianGPT/

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research
Universal YOCO for Efficient Depth Scaling

📝 Summary:
Universal YOCO YOCO-U merges YOCO architecture with recursive computation for efficient LLM depth scaling. It uses iterative processing in shallow attention layers, offering constant KV cache and better token utility.

🔹 Publication Date: Published on Apr 1

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2604.01220
• PDF: https://arxiv.org/pdf/2604.01220

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#AI #DataScience #MachineLearning #HuggingFace #Research