ML Research Hub
32.5K subscribers
6.03K photos
388 videos
24 files
6.53K links
Advancing research in Machine Learning – practical insights, tools, and techniques for researchers.

Admin: @HusseinSheikho || @Hussein_Sheikho
Download Telegram
Media is too big
VIEW IN TELEGRAM
Mode Seeking meets Mean Seeking for Fast Long Video Generation

📝 Summary:
This paper introduces a Decoupled Diffusion Transformer combining mode seeking and mean seeking for efficient long video generation. It leverages global flow matching for narrative coherence and local distribution matching against a short-video teacher for realism, effectively bridging the fideli...

🔹 Publication Date: Published on Feb 27

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2602.24289
• PDF: https://arxiv.org/pdf/2602.24289
• Project Page: https://primecai.github.io/mmm/
• Github: https://primecai.github.io/mmm/

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#VideoGeneration #DiffusionModels #AIResearch #MachineLearning #ComputerVision
CFG-Ctrl: Control-Based Classifier-Free Diffusion Guidance

📝 Summary:
This paper reinterprets Classifier-Free Guidance CFG as a control system for diffusion models. It introduces Sliding Mode Control CFG SMC-CFG to overcome instability in existing linear CFG methods. SMC-CFG improves semantic alignment and stability across various guidance scales.

🔹 Publication Date: Published on Mar 3

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.03281
• PDF: https://arxiv.org/pdf/2603.03281
• Project Page: https://hanyang-21.github.io/CFG-Ctrl
• Github: https://github.com/hanyang-21/CFG-Ctrl

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#DiffusionModels #GenerativeAI #ControlSystems #MachineLearning #AIResearch
WorldCache: Accelerating World Models for Free via Heterogeneous Token Caching

📝 Summary:
WorldCache speeds up slow diffusion-based world models by addressing token heterogeneity and non-uniform dynamics. It uses curvature-guided prediction and chaotic-prioritized skipping. This achieves up to 3.7 times faster inference with 98 percent rollout quality.

🔹 Publication Date: Published on Mar 6

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.06331
• PDF: https://arxiv.org/pdf/2603.06331

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#WorldModels #DiffusionModels #AI #MachineLearning #Optimization
EffectMaker: Unifying Reasoning and Generation for Customized Visual Effect Creation

📝 Summary:
EffectMaker is a unified framework for reference-based VFX customization. It uses a multimodal language model and diffusion transformer for semantic-visual guidance, generating high-quality effects consistently without per-effect fine-tuning. This is supported by a large synthetic dataset.

🔹 Publication Date: Published on Mar 6

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.06014
• PDF: https://arxiv.org/pdf/2603.06014
• Project Page: https://effectmaker.github.io/

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#VFX #GenerativeAI #DiffusionModels #MultimodalAI #ComputerVision
TDM-R1: Reinforcing Few-Step Diffusion Models with Non-Differentiable Reward

📝 Summary:
TDM-R1 is a novel reinforcement learning method that enhances few-step generative models by incorporating non-differentiable real-world rewards. It overcomes limitations of existing RL approaches, achieving state-of-the-art performance with significantly fewer steps.

🔹 Publication Date: Published on Mar 8

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.07700
• PDF: https://arxiv.org/pdf/2603.07700
• Project Page: https://luo-yihong.github.io/TDM-R1-Page/
• Github: https://github.com/Luo-Yihong/TDM-R1

🔹 Models citing this paper:
https://huggingface.co/Luo-Yihong/TDM-R1

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#DiffusionModels #ReinforcementLearning #GenerativeAI #MachineLearning #DeepLearning
SVG-EAR: Parameter-Free Linear Compensation for Sparse Video Generation via Error-aware Routing

📝 Summary:
SVG-EAR introduces a parameter-free method for video diffusion transformers to reduce quadratic attention cost. It recovers missing contributions via centroid approximation and uses error-aware routing to prioritize high-error blocks. This improves efficiency and quality, achieving significant sp...

🔹 Publication Date: Published on Mar 9

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.08982
• PDF: https://arxiv.org/pdf/2603.08982

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#VideoGeneration #DiffusionModels #Transformers #AIResearch #MachineLearning
WaDi: Weight Direction-aware Distillation for One-step Image Synthesis

📝 Summary:
Diffusion model inference is slow. WaDi focuses on weight direction changes during distillation to accelerate models into efficient one-step generators. This achieves state-of-the-art quality with significantly fewer parameters and broad versatility.

🔹 Publication Date: Published on Mar 9

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.08258
• PDF: https://arxiv.org/pdf/2603.08258
• Github: https://github.com/gudaochangsheng/WaDi

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#DiffusionModels #ImageSynthesis #ModelAcceleration #DeepLearning #AIResearch
OmniForcing: Unleashing Real-time Joint Audio-Visual Generation

📝 Summary:
OmniForcing transforms slow bidirectional audio-visual diffusion models into fast, real-time streaming generators. It tackles training instability and synchronization by using asymmetric alignment, a global prefix, and an audio sink token. This enables high-fidelity, synchronized generation at 25...

🔹 Publication Date: Published on Mar 12

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.11647
• PDF: https://arxiv.org/pdf/2603.11647
• Project Page: https://omniforcing.com/
• Github: https://github.com/OmniForcing/OmniForcing

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#GenerativeAI #AudioVisual #RealtimeAI #DiffusionModels #DeepLearning
Learning Latent Proxies for Controllable Single-Image Relighting

📝 Summary:
Single-image relighting is challenging due to unobserved geometry and materials. LightCtrl introduces a diffusion model guided by sparse, physically meaningful cues from a latent proxy encoder and lighting-aware masks. This enables photometrically faithful relighting with accurate control, outper...

🔹 Publication Date: Published on Mar 16

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.15555
• PDF: https://arxiv.org/pdf/2603.15555

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#ImageRelighting #DiffusionModels #ComputerVision #DeepLearning #AIResearch
WiT: Waypoint Diffusion Transformers via Trajectory Conflict Navigation

📝 Summary:
Waypoint Diffusion Transformers WiT address trajectory conflicts in pixel-space flow matching using semantic waypoints from pre-trained vision models. WiT disentangles generation paths into segments, accelerating training convergence. It outperforms pixel-space baselines and speeds up JiT trainin...

🔹 Publication Date: Published on Mar 16

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.15132
• PDF: https://arxiv.org/pdf/2603.15132
• Project Page: https://hainuo-wang.github.io/WiT/
• Github: https://github.com/hainuo-wang/WiT

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#DiffusionModels #Transformers #ComputerVision #DeepLearning #AI