ML Research Hub
32.3K subscribers
6.73K photos
472 videos
24 files
7.34K links
Advancing research in Machine Learning – practical insights, tools, and techniques for researchers.

Admin: @HusseinSheikho || @Hussein_Sheikho
Download Telegram
SPEED-Bench: A Unified and Diverse Benchmark for Speculative Decoding

📝 Summary:
SPEED-Bench is introduced as a new benchmark for Speculative Decoding SD evaluation. It provides diverse semantic domains and realistic serving regimes to address limitations of existing benchmarks. This enables accurate measurement of SD performance in production environments, setting a unified ...

🔹 Publication Date: Published on Feb 10

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2604.09557
• PDF: https://arxiv.org/pdf/2604.09557
• Project Page: https://huggingface.co/blog/nvidia/speed-bench
• Github: https://github.com/NVIDIA/Model-Optimizer

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#SpeculativeDecoding #AIBenchmarks #LLMs #DeepLearning #ModelOptimization
TRACE: Capability-Targeted Agentic Training

📝 Summary:
TRACE improves LLM agents by identifying capability gaps from trajectory comparisons. It then creates targeted training environments for specific skills, using LoRA adapters for efficient, environment-specific self-improvement. This boosts performance on customer service and tool use tasks, outpe...

🔹 Publication Date: Published on Apr 7

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2604.05336
• PDF: https://arxiv.org/pdf/2604.05336
• Project Page: https://scalingintelligence.stanford.edu/blogs/trace/
• Github: https://github.com/ScalingIntelligence/TRACE

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#LLMAgents #AI #MachineLearning #LoRA #DeepLearning
This media is not supported in your browser
VIEW IN TELEGRAM
Accelerating Speculative Decoding with Block Diffusion Draft Trees

📝 Summary:
DDTree enhances speculative decoding by constructing draft trees from block diffusion drafter distributions. It efficiently verifies multiple trajectories in parallel in a single target model pass, improving performance.

🔹 Publication Date: Published on Apr 14

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2604.12989
• PDF: https://arxiv.org/pdf/2604.12989
• Project Page: https://liranringel.github.io/ddtree
• Github: https://github.com/liranringel/ddtree

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#SpeculativeDecoding #BlockDiffusion #LLMAcceleration #DeepLearning #AIResearch
1
Domain-Specific Latent Representations Improve the Fidelity of Diffusion-Based Medical Image Super-Resolution

📝 Summary:
Domain-specific autoencoders significantly enhance medical image super-resolution. Replacing generic VAEs improves fidelity, showing autoencoder choice is key, not the diffusion architecture. Autoencoder performance predicts overall SR quality.

🔹 Publication Date: Published on Apr 14

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2604.12152
• PDF: https://arxiv.org/pdf/2604.12152
• Github: https://github.com/sebasmos/latent-sr

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#MedicalImaging #SuperResolution #DiffusionModels #DeepLearning #Autoencoders
3DTV: A Feedforward Interpolation Network for Real-Time View Synthesis

📝 Summary:
3DTV is a feedforward network combining lightweight geometry and learning for real-time, robust sparse-view interpolation. It generates novel views efficiently without scene-specific optimization, making it practical for interactive applications.

🔹 Publication Date: Published on Apr 13

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2604.11211
• PDF: https://arxiv.org/pdf/2604.11211
• Project Page: https://stefanmschulz.github.io/3DTV_webpage/
• Github: https://github.com/StefanMSchulz/3DTV

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#ViewSynthesis #DeepLearning #ComputerVision #NeuralNetworks #RealTimeAI
ReconPhys: Reconstruct Appearance and Physical Attributes from Single Video

📝 Summary:
ReconPhys is the first feedforward framework to jointly learn physical attribute estimation and 3D Gaussian Splatting reconstruction from a single video. It offers significantly faster inference and superior reconstruction quality for non-rigid objects compared to prior optimization-based methods...

🔹 Publication Date: Published on Apr 9

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2604.07882
• PDF: https://arxiv.org/pdf/2604.07882
• Project Page: https://chuanshuogushi.github.io/ReconPhys/
• Github: https://chuanshuogushi.github.io/ReconPhys/

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#ComputerVision #3DReconstruction #GaussianSplatting #DeepLearning #AIResearch
Qwen3.5-Omni Technical Report

📝 Summary:
Qwen3.5-Omni is a large multimodal model excelling in audio-visual understanding and generation, achieving SOTA results across many benchmarks. It features a Hybrid Attention MoE architecture, introduces ARIA for improved speech synthesis, and exhibits a new Audio-Visual Vibe Coding capability.

🔹 Publication Date: Published on Apr 17

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2604.15804
• PDF: https://arxiv.org/pdf/2604.15804

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#MultimodalAI #AIResearch #DeepLearning #GenerativeAI #SpeechSynthesis
Learning Adaptive Reasoning Paths for Efficient Visual Reasoning

📝 Summary:
Existing visual reasoning models often overthink, using redundant steps. AVR is an adaptive framework that dynamically chooses efficient reasoning formats. It reduces token usage by 50-90 percent while maintaining accuracy.

🔹 Publication Date: Published on Apr 16

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2604.14568
• PDF: https://arxiv.org/pdf/2604.14568
• Github: https://github.com/RunRiotComeOn/AVR

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#VisualReasoning #AI #MachineLearning #Efficiency #DeepLearning
This media is not supported in your browser
VIEW IN TELEGRAM
Repurposing 3D Generative Model for Autoregressive Layout Generation

📝 Summary:
LaviGen is a 3D layout generation framework that repurposes 3D generative models. It uses an adapted 3D diffusion model for autoregressive generation, explicitly modeling geometric relations and physical constraints. This achieves superior, more plausible 3D layouts 65% faster than previous methods.

🔹 Publication Date: Published on Apr 17

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2604.16299
• PDF: https://arxiv.org/pdf/2604.16299
• Project Page: https://fenghora.github.io/LaviGen-Page/
• Github: https://github.com/fenghora/LaviGen

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#3DGeneration #DiffusionModels #GenerativeAI #ComputerGraphics #DeepLearning
Media is too big
VIEW IN TELEGRAM
Hierarchical Codec Diffusion for Video-to-Speech Generation

📝 Summary:
HiCoDiT generates speech from videos by leveraging the hierarchical structure of discrete speech tokens, achieving better audio-visual alignment through coarse-to-fine conditioning with dual-scale nor...

🔹 Publication Date: Published on Apr 17

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2604.15923
• PDF: https://arxiv.org/pdf/2604.15923

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#VideoToSpeech #DiffusionModels #GenerativeAI #SpeechSynthesis #DeepLearning
Concrete Jungle: Towards Concreteness Paved Contrastive Negative Mining for Compositional Understanding

📝 Summary:
This paper improves vision-language models for compositional reasoning by using concreteness-based negative sample selection and a novel margin-based loss. Their framework, Slipform, achieves state-of-the-art accuracy on compositional benchmarks and cross-modal retrieval.

🔹 Publication Date: Published on Apr 14

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2604.13313
• PDF: https://arxiv.org/pdf/2604.13313

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#VisionLanguage #DeepLearning #AIResearch #ComputerVision #NLP
UDM-GRPO: Stable and Efficient Group Relative Policy Optimization for Uniform Discrete Diffusion Models

📝 Summary:
UDM-GRPO integrates Uniform Discrete Diffusion Models with reinforcement learning, solving training instability issues. It optimizes using final samples as actions and reconstructed trajectories. This achieves state-of-the-art performance in text-to-image generation and OCR tasks.

🔹 Publication Date: Published on Apr 20

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2604.18518
• PDF: https://arxiv.org/pdf/2604.18518
• Project Page: https://yovecent.github.io/UDM-GRPO.github.io/
• Github: https://github.com/Yovecent/UDM-GRPO

🔹 Models citing this paper:
https://huggingface.co/Yovecents/URSA-1.7B-IBQ512-UDMGRPO-GenEval
https://huggingface.co/Yovecents/URSA-1.7B-IBQ512-UDMGRPO-PickScore

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#DiffusionModels #ReinforcementLearning #GenerativeAI #TextToImage #DeepLearning
1
Scaling Test-Time Compute for Agentic Coding

📝 Summary:
This framework improves long-horizon agentic coding by using compact trajectory representations for test-time scaling. It employs Recursive Tournament Voting and adapted Parallel-Distill-Refine to significantly boost coding agent performance on benchmarks.

🔹 Publication Date: Published on Apr 16

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2604.16529
• PDF: https://arxiv.org/pdf/2604.16529

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#AgenticAI #CodingAgents #MachineLearning #AIResearch #DeepLearning
1
This media is not supported in your browser
VIEW IN TELEGRAM
DeVI: Physics-based Dexterous Human-Object Interaction via Synthetic Video Imitation

📝 Summary:
DeVI enables physically plausible dexterous robot control by leveraging text-conditioned synthetic videos through a hybrid tracking reward that combines 3D and 2D tracking for improved hand-object int...

🔹 Publication Date: Published on Apr 22

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2604.20841
• PDF: https://arxiv.org/pdf/2604.20841
• Project Page: https://snuvclab.github.io/devi/
• Github: https://github.com/snuvclab/devi

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#Robotics #AI #ComputerVision #HumanRobotInteraction #DeepLearning
Encoder-Free Human Motion Understanding via Structured Motion Descriptions

📝 Summary:
Structured Motion Description SMD converts human motion into natural language, enabling large language models LLMs to reason about it directly. This encoder-free method achieves state-of-the-art performance on motion question answering and captioning.

🔹 Publication Date: Published on Apr 23

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2604.21668
• PDF: https://arxiv.org/pdf/2604.21668
• Project Page: https://yaozhang182.github.io/motion-smd/
• Github: https://yaozhang182.github.io/motion-smd/

🔹 Models citing this paper:
https://huggingface.co/zyyy12138/motion-smd-lora

Datasets citing this paper:
https://huggingface.co/datasets/zyyy12138/motion-smd-data

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#HumanMotionUnderstanding #LLMs #NLP #AI #DeepLearning
1
A Comprehensive Survey of Mixture-of-Experts: Algorithms, Theory, and Applications

📝 Summary:
Mixture of Experts MoE models enhance large AI model efficiency and performance by dynamically selecting sub-models for diverse data. This survey details MoE design, algorithms, theory, and applications in various machine learning fields.

🔹 Publication Date: Published on Mar 10, 2025

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2503.07137
• PDF: https://arxiv.org/pdf/2503.07137
• Github: https://github.com/deepseek-ai/DeepEP

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#MixtureOfExperts #MoE #AI #MachineLearning #DeepLearning