Media is too big
VIEW IN TELEGRAM
✨Mode Seeking meets Mean Seeking for Fast Long Video Generation
📝 Summary:
This paper introduces a Decoupled Diffusion Transformer combining mode seeking and mean seeking for efficient long video generation. It leverages global flow matching for narrative coherence and local distribution matching against a short-video teacher for realism, effectively bridging the fideli...
🔹 Publication Date: Published on Feb 27
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2602.24289
• PDF: https://arxiv.org/pdf/2602.24289
• Project Page: https://primecai.github.io/mmm/
• Github: https://primecai.github.io/mmm/
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#VideoGeneration #DiffusionModels #AIResearch #MachineLearning #ComputerVision
📝 Summary:
This paper introduces a Decoupled Diffusion Transformer combining mode seeking and mean seeking for efficient long video generation. It leverages global flow matching for narrative coherence and local distribution matching against a short-video teacher for realism, effectively bridging the fideli...
🔹 Publication Date: Published on Feb 27
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2602.24289
• PDF: https://arxiv.org/pdf/2602.24289
• Project Page: https://primecai.github.io/mmm/
• Github: https://primecai.github.io/mmm/
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#VideoGeneration #DiffusionModels #AIResearch #MachineLearning #ComputerVision
✨CFG-Ctrl: Control-Based Classifier-Free Diffusion Guidance
📝 Summary:
This paper reinterprets Classifier-Free Guidance CFG as a control system for diffusion models. It introduces Sliding Mode Control CFG SMC-CFG to overcome instability in existing linear CFG methods. SMC-CFG improves semantic alignment and stability across various guidance scales.
🔹 Publication Date: Published on Mar 3
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.03281
• PDF: https://arxiv.org/pdf/2603.03281
• Project Page: https://hanyang-21.github.io/CFG-Ctrl
• Github: https://github.com/hanyang-21/CFG-Ctrl
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#DiffusionModels #GenerativeAI #ControlSystems #MachineLearning #AIResearch
📝 Summary:
This paper reinterprets Classifier-Free Guidance CFG as a control system for diffusion models. It introduces Sliding Mode Control CFG SMC-CFG to overcome instability in existing linear CFG methods. SMC-CFG improves semantic alignment and stability across various guidance scales.
🔹 Publication Date: Published on Mar 3
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.03281
• PDF: https://arxiv.org/pdf/2603.03281
• Project Page: https://hanyang-21.github.io/CFG-Ctrl
• Github: https://github.com/hanyang-21/CFG-Ctrl
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#DiffusionModels #GenerativeAI #ControlSystems #MachineLearning #AIResearch
✨WorldCache: Accelerating World Models for Free via Heterogeneous Token Caching
📝 Summary:
WorldCache speeds up slow diffusion-based world models by addressing token heterogeneity and non-uniform dynamics. It uses curvature-guided prediction and chaotic-prioritized skipping. This achieves up to 3.7 times faster inference with 98 percent rollout quality.
🔹 Publication Date: Published on Mar 6
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.06331
• PDF: https://arxiv.org/pdf/2603.06331
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#WorldModels #DiffusionModels #AI #MachineLearning #Optimization
📝 Summary:
WorldCache speeds up slow diffusion-based world models by addressing token heterogeneity and non-uniform dynamics. It uses curvature-guided prediction and chaotic-prioritized skipping. This achieves up to 3.7 times faster inference with 98 percent rollout quality.
🔹 Publication Date: Published on Mar 6
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.06331
• PDF: https://arxiv.org/pdf/2603.06331
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#WorldModels #DiffusionModels #AI #MachineLearning #Optimization
✨EffectMaker: Unifying Reasoning and Generation for Customized Visual Effect Creation
📝 Summary:
EffectMaker is a unified framework for reference-based VFX customization. It uses a multimodal language model and diffusion transformer for semantic-visual guidance, generating high-quality effects consistently without per-effect fine-tuning. This is supported by a large synthetic dataset.
🔹 Publication Date: Published on Mar 6
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.06014
• PDF: https://arxiv.org/pdf/2603.06014
• Project Page: https://effectmaker.github.io/
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#VFX #GenerativeAI #DiffusionModels #MultimodalAI #ComputerVision
📝 Summary:
EffectMaker is a unified framework for reference-based VFX customization. It uses a multimodal language model and diffusion transformer for semantic-visual guidance, generating high-quality effects consistently without per-effect fine-tuning. This is supported by a large synthetic dataset.
🔹 Publication Date: Published on Mar 6
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.06014
• PDF: https://arxiv.org/pdf/2603.06014
• Project Page: https://effectmaker.github.io/
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#VFX #GenerativeAI #DiffusionModels #MultimodalAI #ComputerVision
✨TDM-R1: Reinforcing Few-Step Diffusion Models with Non-Differentiable Reward
📝 Summary:
TDM-R1 is a novel reinforcement learning method that enhances few-step generative models by incorporating non-differentiable real-world rewards. It overcomes limitations of existing RL approaches, achieving state-of-the-art performance with significantly fewer steps.
🔹 Publication Date: Published on Mar 8
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.07700
• PDF: https://arxiv.org/pdf/2603.07700
• Project Page: https://luo-yihong.github.io/TDM-R1-Page/
• Github: https://github.com/Luo-Yihong/TDM-R1
🔹 Models citing this paper:
• https://huggingface.co/Luo-Yihong/TDM-R1
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#DiffusionModels #ReinforcementLearning #GenerativeAI #MachineLearning #DeepLearning
📝 Summary:
TDM-R1 is a novel reinforcement learning method that enhances few-step generative models by incorporating non-differentiable real-world rewards. It overcomes limitations of existing RL approaches, achieving state-of-the-art performance with significantly fewer steps.
🔹 Publication Date: Published on Mar 8
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.07700
• PDF: https://arxiv.org/pdf/2603.07700
• Project Page: https://luo-yihong.github.io/TDM-R1-Page/
• Github: https://github.com/Luo-Yihong/TDM-R1
🔹 Models citing this paper:
• https://huggingface.co/Luo-Yihong/TDM-R1
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#DiffusionModels #ReinforcementLearning #GenerativeAI #MachineLearning #DeepLearning
✨SVG-EAR: Parameter-Free Linear Compensation for Sparse Video Generation via Error-aware Routing
📝 Summary:
SVG-EAR introduces a parameter-free method for video diffusion transformers to reduce quadratic attention cost. It recovers missing contributions via centroid approximation and uses error-aware routing to prioritize high-error blocks. This improves efficiency and quality, achieving significant sp...
🔹 Publication Date: Published on Mar 9
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.08982
• PDF: https://arxiv.org/pdf/2603.08982
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#VideoGeneration #DiffusionModels #Transformers #AIResearch #MachineLearning
📝 Summary:
SVG-EAR introduces a parameter-free method for video diffusion transformers to reduce quadratic attention cost. It recovers missing contributions via centroid approximation and uses error-aware routing to prioritize high-error blocks. This improves efficiency and quality, achieving significant sp...
🔹 Publication Date: Published on Mar 9
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.08982
• PDF: https://arxiv.org/pdf/2603.08982
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#VideoGeneration #DiffusionModels #Transformers #AIResearch #MachineLearning
✨WaDi: Weight Direction-aware Distillation for One-step Image Synthesis
📝 Summary:
Diffusion model inference is slow. WaDi focuses on weight direction changes during distillation to accelerate models into efficient one-step generators. This achieves state-of-the-art quality with significantly fewer parameters and broad versatility.
🔹 Publication Date: Published on Mar 9
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.08258
• PDF: https://arxiv.org/pdf/2603.08258
• Github: https://github.com/gudaochangsheng/WaDi
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#DiffusionModels #ImageSynthesis #ModelAcceleration #DeepLearning #AIResearch
📝 Summary:
Diffusion model inference is slow. WaDi focuses on weight direction changes during distillation to accelerate models into efficient one-step generators. This achieves state-of-the-art quality with significantly fewer parameters and broad versatility.
🔹 Publication Date: Published on Mar 9
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.08258
• PDF: https://arxiv.org/pdf/2603.08258
• Github: https://github.com/gudaochangsheng/WaDi
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#DiffusionModels #ImageSynthesis #ModelAcceleration #DeepLearning #AIResearch
✨OmniForcing: Unleashing Real-time Joint Audio-Visual Generation
📝 Summary:
OmniForcing transforms slow bidirectional audio-visual diffusion models into fast, real-time streaming generators. It tackles training instability and synchronization by using asymmetric alignment, a global prefix, and an audio sink token. This enables high-fidelity, synchronized generation at 25...
🔹 Publication Date: Published on Mar 12
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.11647
• PDF: https://arxiv.org/pdf/2603.11647
• Project Page: https://omniforcing.com/
• Github: https://github.com/OmniForcing/OmniForcing
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#GenerativeAI #AudioVisual #RealtimeAI #DiffusionModels #DeepLearning
📝 Summary:
OmniForcing transforms slow bidirectional audio-visual diffusion models into fast, real-time streaming generators. It tackles training instability and synchronization by using asymmetric alignment, a global prefix, and an audio sink token. This enables high-fidelity, synchronized generation at 25...
🔹 Publication Date: Published on Mar 12
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.11647
• PDF: https://arxiv.org/pdf/2603.11647
• Project Page: https://omniforcing.com/
• Github: https://github.com/OmniForcing/OmniForcing
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#GenerativeAI #AudioVisual #RealtimeAI #DiffusionModels #DeepLearning
✨Learning Latent Proxies for Controllable Single-Image Relighting
📝 Summary:
Single-image relighting is challenging due to unobserved geometry and materials. LightCtrl introduces a diffusion model guided by sparse, physically meaningful cues from a latent proxy encoder and lighting-aware masks. This enables photometrically faithful relighting with accurate control, outper...
🔹 Publication Date: Published on Mar 16
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.15555
• PDF: https://arxiv.org/pdf/2603.15555
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#ImageRelighting #DiffusionModels #ComputerVision #DeepLearning #AIResearch
📝 Summary:
Single-image relighting is challenging due to unobserved geometry and materials. LightCtrl introduces a diffusion model guided by sparse, physically meaningful cues from a latent proxy encoder and lighting-aware masks. This enables photometrically faithful relighting with accurate control, outper...
🔹 Publication Date: Published on Mar 16
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.15555
• PDF: https://arxiv.org/pdf/2603.15555
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#ImageRelighting #DiffusionModels #ComputerVision #DeepLearning #AIResearch
✨WiT: Waypoint Diffusion Transformers via Trajectory Conflict Navigation
📝 Summary:
Waypoint Diffusion Transformers WiT address trajectory conflicts in pixel-space flow matching using semantic waypoints from pre-trained vision models. WiT disentangles generation paths into segments, accelerating training convergence. It outperforms pixel-space baselines and speeds up JiT trainin...
🔹 Publication Date: Published on Mar 16
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.15132
• PDF: https://arxiv.org/pdf/2603.15132
• Project Page: https://hainuo-wang.github.io/WiT/
• Github: https://github.com/hainuo-wang/WiT
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#DiffusionModels #Transformers #ComputerVision #DeepLearning #AI
📝 Summary:
Waypoint Diffusion Transformers WiT address trajectory conflicts in pixel-space flow matching using semantic waypoints from pre-trained vision models. WiT disentangles generation paths into segments, accelerating training convergence. It outperforms pixel-space baselines and speeds up JiT trainin...
🔹 Publication Date: Published on Mar 16
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.15132
• PDF: https://arxiv.org/pdf/2603.15132
• Project Page: https://hainuo-wang.github.io/WiT/
• Github: https://github.com/hainuo-wang/WiT
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#DiffusionModels #Transformers #ComputerVision #DeepLearning #AI