ML Research Hub

✨SkyReels-V2: Infinite-length Film Generative Model

📝 Summary:
SkyReels-V2 is an infinite-length film generative model that addresses video generation challenges by synergizing MLLMs, reinforcement learning, and a diffusion forcing framework. It enables high-quality, long-form video synthesis with realistic motion and cinematic grammar awareness through mult...

🔹 Publication Date: Published on Apr 17

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2504.13074
• PDF: https://arxiv.org/pdf/2504.13074
• Github: https://github.com/skyworkai/skyreels-v2

🔹 Models citing this paper:
• https://huggingface.co/Skywork/SkyReels-V2-I2V-14B-540P
• https://huggingface.co/Skywork/SkyCaptioner-V1
• https://huggingface.co/Skywork/SkyReels-V2-I2V-1.3B-540P

✨ Spaces citing this paper:
• https://huggingface.co/spaces/fffiloni/SkyReels-V2
• https://huggingface.co/spaces/Dudu0043/SkyReels-V2
• https://huggingface.co/spaces/14eee109giet/SkyReels-V2

==================================

For more data science resources:
✓ https://t.iss.one/DataScienceT

#VideoGeneration #GenerativeAI #MLLM #DiffusionModels #AIResearch

arXiv.org

SkyReels-V2: Infinite-length Film Generative Model

Recent advances in video generation have been driven by diffusion models and autoregressive frameworks, yet critical challenges persist in harmonizing prompt adherence, visual quality, motion...

❤2

740 views04:01

✨ Explore Data Science 📝 Write your paper

ML Research Hub

0:21

This media is not supported in your browser

VIEW IN TELEGRAM

✨InsertAnywhere: Bridging 4D Scene Geometry and Diffusion Models for Realistic Video Object Insertion

📝 Summary:
InsertAnywhere is a framework for realistic video object insertion. It uses 4D aware mask generation for geometric consistency and an extended diffusion model for appearance-faithful synthesis, outperforming existing methods.

🔹 Publication Date: Published on Dec 19

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.17504
• PDF: https://arxiv.org/pdf/2512.17504
• Project Page: https://myyzzzoooo.github.io/InsertAnywhere/
• Github: https://github.com/myyzzzoooo/InsertAnywhere

==================================

For more data science resources:
✓ https://t.iss.one/DataScienceT

#VideoEditing #DiffusionModels #ComputerVision #DeepLearning #GenerativeAI

❤1

399 views03:00

✨ Explore Data Science 📝 Write your paper

ML Research Hub

0:55

This media is not supported in your browser

VIEW IN TELEGRAM

✨LiveTalk: Real-Time Multimodal Interactive Video Diffusion via Improved On-Policy Distillation

📝 Summary:
LiveTalk enables real-time multimodal interactive video generation from text, image, and audio by improving on-policy diffusion distillation. It reduces inference latency by 20x while maintaining quality, allowing seamless human-AI interaction.

🔹 Publication Date: Published on Dec 29

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.23576
• PDF: https://arxiv.org/pdf/2512.23576
• Github: https://github.com/GAIR-NLP/LiveTalk

==================================

For more data science resources:
✓ https://t.iss.one/DataScienceT

#VideoGeneration #AI #DiffusionModels #RealTimeAI #MultimodalAI

295 views09:53

✨ Explore Data Science 📝 Write your paper

ML Research Hub

✨Diffusion Knows Transparency: Repurposing Video Diffusion for Transparent Object Depth and Normal Estimation

📝 Summary:
Transparent objects are hard for perception. This work observes video diffusion models can synthesize transparent phenomena, so they repurpose one. Their DKT model, trained on a new dataset, achieves zero-shot SOTA for depth and normal estimation of transparent objects, proving diffusion knows tr...

🔹 Publication Date: Published on Dec 29

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.23705
• PDF: https://arxiv.org/pdf/2512.23705
• Project Page: https://daniellli.github.io/projects/DKT/
• Github: https://github.com/Daniellli/DKT

==================================

For more data science resources:
✓ https://t.iss.one/DataScienceT

#ComputerVision #DiffusionModels #DepthEstimation #TransparentObjects #AIResearch

185 views09:57

✨ Explore Data Science 📝 Write your paper

ML Research Hub

✨SpotEdit: Selective Region Editing in Diffusion Transformers

📝 Summary:
SpotEdit is a training-free framework for selective image editing in diffusion transformers. It avoids reprocessing stable regions by reusing their features, combining them with edited areas. This reduces computation and preserves unchanged regions, enhancing efficiency and precision.

🔹 Publication Date: Published on Dec 26

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.22323
• PDF: https://arxiv.org/pdf/2512.22323
• Project Page: https://biangbiang0321.github.io/SpotEdit.github.io
• Github: https://biangbiang0321.github.io/SpotEdit.github.io

==================================

For more data science resources:
✓ https://t.iss.one/DataScienceT

#ImageEditing #DiffusionModels #ComputerVision #AIResearch #DeepLearning

148 views09:57

✨ Explore Data Science 📝 Write your paper

ML Research Hub

✨Dream-VL & Dream-VLA: Open Vision-Language and Vision-Language-Action Models with Diffusion Language Model Backbone

📝 Summary:
Dream-VL and Dream-VLA are diffusion-based vision-language and vision-language-action models. They achieve state-of-the-art performance in visual planning and robotic control, surpassing autoregressive baselines via their diffusion backbone's superior action generation.

🔹 Publication Date: Published on Dec 27

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.22615
• PDF: https://arxiv.org/pdf/2512.22615
• Project Page: https://hkunlp.github.io/blog/2025/dream-vlx/
• Github: https://github.com/DreamLM/Dream-VLX

==================================

For more data science resources:
✓ https://t.iss.one/DataScienceT

#VisionLanguageModels #DiffusionModels #Robotics #AI #ComputerVision

134 views09:57

✨ Explore Data Science 📝 Write your paper

ML Research Hub

✨GRAN-TED: Generating Robust, Aligned, and Nuanced Text Embedding for Diffusion Models

📝 Summary:
GRAN-TED improves text encoders for diffusion models by addressing evaluation and adaptation challenges. It introduces TED-6K, an efficient text-only benchmark that predicts generation quality 750x faster. Using this, GRAN-TED develops a superior encoder via a two-stage training method, enhancing...

🔹 Publication Date: Published on Dec 17

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.15560
• PDF: https://arxiv.org/pdf/2512.15560

==================================

For more data science resources:
✓ https://t.iss.one/DataScienceT

#DiffusionModels #TextEmbeddings #AIResearch #MachineLearning #NLP

129 views09:58

✨ Explore Data Science 📝 Write your paper

ML Research Hub

✨DiRL: An Efficient Post-Training Framework for Diffusion Language Models

📝 Summary:
DiRL is an efficient post-training framework for Diffusion Language Models, integrating online updates and introducing DiPO for unbiased policy optimization. It achieves state-of-the-art math performance for dLLMs, surpassing comparable models.

🔹 Publication Date: Published on Dec 23

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.22234
• PDF: https://arxiv.org/pdf/2512.22234
• Github: https://github.com/OpenMOSS/DiRL

🔹 Models citing this paper:
• https://huggingface.co/OpenMOSS-Team/DiRL-8B-Instruct

==================================

For more data science resources:
✓ https://t.iss.one/DataScienceT

#DiffusionModels #LLM #ModelOptimization #MachineLearning #AI

209 views09:58

✨ Explore Data Science 📝 Write your paper

ML Research Hub

✨UltraShape 1.0: High-Fidelity 3D Shape Generation via Scalable Geometric Refinement

📝 Summary:
UltraShape 1.0 is a 3D diffusion framework that generates high-fidelity shapes using a two-stage process: coarse then refined geometry. It includes a novel data pipeline improving dataset quality, enabling strong geometric results on public data.

🔹 Publication Date: Published on Dec 24

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.21185
• PDF: https://arxiv.org/pdf/2512.21185
• Project Page: https://pku-yuangroup.github.io/UltraShape-1.0/
• Github: https://pku-yuangroup.github.io/UltraShape-1.0/

🔹 Models citing this paper:
• https://huggingface.co/infinith/UltraShape

==================================

For more data science resources:
✓ https://t.iss.one/DataScienceT

#3DGeneration #DiffusionModels #GenerativeAI #ComputerGraphics #DeepLearning

342 views09:01

✨ Explore Data Science 📝 Write your paper

ML Research Hub

0:03

This media is not supported in your browser

VIEW IN TELEGRAM

✨GaMO: Geometry-aware Multi-view Diffusion Outpainting for Sparse-View 3D Reconstruction

📝 Summary:
GaMO improves sparse-view 3D reconstruction by using geometry-aware multi-view outpainting. It expands existing views to enhance scene coverage and consistency. This achieves state-of-the-art quality 25x faster than prior methods, with reduced computational cost.

🔹 Publication Date: Published on Dec 31, 2025

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.25073
• PDF: https://arxiv.org/pdf/2512.25073
• Project Page: https://yichuanh.github.io/GaMO/
• Github: https://yichuanh.github.io/GaMO/

==================================

For more data science resources:
✓ https://t.iss.one/DataScienceT

#3DReconstruction #ComputerVision #DiffusionModels #GaMO #AI

288 views08:03

✨ Explore Data Science 📝 Write your paper

About

Blog

Apps

Platform