✨SkyReels-V2: Infinite-length Film Generative Model
📝 Summary:
SkyReels-V2 is an infinite-length film generative model that addresses video generation challenges by synergizing MLLMs, reinforcement learning, and a diffusion forcing framework. It enables high-quality, long-form video synthesis with realistic motion and cinematic grammar awareness through mult...
🔹 Publication Date: Published on Apr 17
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2504.13074
• PDF: https://arxiv.org/pdf/2504.13074
• Github: https://github.com/skyworkai/skyreels-v2
🔹 Models citing this paper:
• https://huggingface.co/Skywork/SkyReels-V2-I2V-14B-540P
• https://huggingface.co/Skywork/SkyCaptioner-V1
• https://huggingface.co/Skywork/SkyReels-V2-I2V-1.3B-540P
✨ Spaces citing this paper:
• https://huggingface.co/spaces/fffiloni/SkyReels-V2
• https://huggingface.co/spaces/Dudu0043/SkyReels-V2
• https://huggingface.co/spaces/14eee109giet/SkyReels-V2
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#VideoGeneration #GenerativeAI #MLLM #DiffusionModels #AIResearch
📝 Summary:
SkyReels-V2 is an infinite-length film generative model that addresses video generation challenges by synergizing MLLMs, reinforcement learning, and a diffusion forcing framework. It enables high-quality, long-form video synthesis with realistic motion and cinematic grammar awareness through mult...
🔹 Publication Date: Published on Apr 17
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2504.13074
• PDF: https://arxiv.org/pdf/2504.13074
• Github: https://github.com/skyworkai/skyreels-v2
🔹 Models citing this paper:
• https://huggingface.co/Skywork/SkyReels-V2-I2V-14B-540P
• https://huggingface.co/Skywork/SkyCaptioner-V1
• https://huggingface.co/Skywork/SkyReels-V2-I2V-1.3B-540P
✨ Spaces citing this paper:
• https://huggingface.co/spaces/fffiloni/SkyReels-V2
• https://huggingface.co/spaces/Dudu0043/SkyReels-V2
• https://huggingface.co/spaces/14eee109giet/SkyReels-V2
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#VideoGeneration #GenerativeAI #MLLM #DiffusionModels #AIResearch
arXiv.org
SkyReels-V2: Infinite-length Film Generative Model
Recent advances in video generation have been driven by diffusion models and autoregressive frameworks, yet critical challenges persist in harmonizing prompt adherence, visual quality, motion...
❤2
This media is not supported in your browser
VIEW IN TELEGRAM
✨InsertAnywhere: Bridging 4D Scene Geometry and Diffusion Models for Realistic Video Object Insertion
📝 Summary:
InsertAnywhere is a framework for realistic video object insertion. It uses 4D aware mask generation for geometric consistency and an extended diffusion model for appearance-faithful synthesis, outperforming existing methods.
🔹 Publication Date: Published on Dec 19
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.17504
• PDF: https://arxiv.org/pdf/2512.17504
• Project Page: https://myyzzzoooo.github.io/InsertAnywhere/
• Github: https://github.com/myyzzzoooo/InsertAnywhere
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#VideoEditing #DiffusionModels #ComputerVision #DeepLearning #GenerativeAI
📝 Summary:
InsertAnywhere is a framework for realistic video object insertion. It uses 4D aware mask generation for geometric consistency and an extended diffusion model for appearance-faithful synthesis, outperforming existing methods.
🔹 Publication Date: Published on Dec 19
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.17504
• PDF: https://arxiv.org/pdf/2512.17504
• Project Page: https://myyzzzoooo.github.io/InsertAnywhere/
• Github: https://github.com/myyzzzoooo/InsertAnywhere
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#VideoEditing #DiffusionModels #ComputerVision #DeepLearning #GenerativeAI
❤1
✨Mindscape-Aware Retrieval Augmented Generation for Improved Long Context Understanding
📝 Summary:
MiA-RAG enhances RAG systems with global context awareness, inspired by human understanding. It uses hierarchical summarization to build a 'mindscape,' improving long-context retrieval and generation for better evidence-based understanding.
🔹 Publication Date: Published on Dec 19
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.17220
• PDF: https://arxiv.org/pdf/2512.17220
🔹 Models citing this paper:
• https://huggingface.co/MindscapeRAG/MiA-Emb-8B
• https://huggingface.co/MindscapeRAG/MiA-Emb-4B
• https://huggingface.co/MindscapeRAG/MiA-Emb-0.6B
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#RAG #LLM #NLP #GenerativeAI #ContextUnderstanding
📝 Summary:
MiA-RAG enhances RAG systems with global context awareness, inspired by human understanding. It uses hierarchical summarization to build a 'mindscape,' improving long-context retrieval and generation for better evidence-based understanding.
🔹 Publication Date: Published on Dec 19
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.17220
• PDF: https://arxiv.org/pdf/2512.17220
🔹 Models citing this paper:
• https://huggingface.co/MindscapeRAG/MiA-Emb-8B
• https://huggingface.co/MindscapeRAG/MiA-Emb-4B
• https://huggingface.co/MindscapeRAG/MiA-Emb-0.6B
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#RAG #LLM #NLP #GenerativeAI #ContextUnderstanding
❤1
Media is too big
VIEW IN TELEGRAM
✨Yume-1.5: A Text-Controlled Interactive World Generation Model
📝 Summary:
Yume-1.5 is a novel framework that generates realistic, interactive, and continuous worlds from a single image or text prompt. It overcomes prior limitations in real-time performance and text control by using unified context compression, streaming acceleration, and text-controlled world events.
🔹 Publication Date: Published on Dec 26
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.22096
• PDF: https://arxiv.org/pdf/2512.22096
• Project Page: https://stdstu12.github.io/YUME-Project/
• Github: https://github.com/stdstu12/YUME
🔹 Models citing this paper:
• https://huggingface.co/stdstu123/Yume-5B-720P
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#AI #GenerativeAI #WorldGeneration #ComputerGraphics #DeepLearning
📝 Summary:
Yume-1.5 is a novel framework that generates realistic, interactive, and continuous worlds from a single image or text prompt. It overcomes prior limitations in real-time performance and text control by using unified context compression, streaming acceleration, and text-controlled world events.
🔹 Publication Date: Published on Dec 26
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.22096
• PDF: https://arxiv.org/pdf/2512.22096
• Project Page: https://stdstu12.github.io/YUME-Project/
• Github: https://github.com/stdstu12/YUME
🔹 Models citing this paper:
• https://huggingface.co/stdstu123/Yume-5B-720P
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#AI #GenerativeAI #WorldGeneration #ComputerGraphics #DeepLearning
✨UltraShape 1.0: High-Fidelity 3D Shape Generation via Scalable Geometric Refinement
📝 Summary:
UltraShape 1.0 is a 3D diffusion framework that generates high-fidelity shapes using a two-stage process: coarse then refined geometry. It includes a novel data pipeline improving dataset quality, enabling strong geometric results on public data.
🔹 Publication Date: Published on Dec 24
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.21185
• PDF: https://arxiv.org/pdf/2512.21185
• Project Page: https://pku-yuangroup.github.io/UltraShape-1.0/
• Github: https://pku-yuangroup.github.io/UltraShape-1.0/
🔹 Models citing this paper:
• https://huggingface.co/infinith/UltraShape
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#3DGeneration #DiffusionModels #GenerativeAI #ComputerGraphics #DeepLearning
📝 Summary:
UltraShape 1.0 is a 3D diffusion framework that generates high-fidelity shapes using a two-stage process: coarse then refined geometry. It includes a novel data pipeline improving dataset quality, enabling strong geometric results on public data.
🔹 Publication Date: Published on Dec 24
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.21185
• PDF: https://arxiv.org/pdf/2512.21185
• Project Page: https://pku-yuangroup.github.io/UltraShape-1.0/
• Github: https://pku-yuangroup.github.io/UltraShape-1.0/
🔹 Models citing this paper:
• https://huggingface.co/infinith/UltraShape
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#3DGeneration #DiffusionModels #GenerativeAI #ComputerGraphics #DeepLearning
This media is not supported in your browser
VIEW IN TELEGRAM
✨SpaceTimePilot: Generative Rendering of Dynamic Scenes Across Space and Time
📝 Summary:
SpaceTimePilot is a video diffusion model for dynamic scene rendering, offering independent control over spatial viewpoint and temporal motion. It achieves precise space-time disentanglement via a time-embedding, temporal-warping training, and a synthetic dataset.
🔹 Publication Date: Published on Dec 31, 2025
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.25075
• PDF: https://arxiv.org/pdf/2512.25075
• Project Page: https://zheninghuang.github.io/Space-Time-Pilot/
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#VideoDiffusion #GenerativeAI #DynamicScenes #ComputerGraphics #DeepLearning
📝 Summary:
SpaceTimePilot is a video diffusion model for dynamic scene rendering, offering independent control over spatial viewpoint and temporal motion. It achieves precise space-time disentanglement via a time-embedding, temporal-warping training, and a synthetic dataset.
🔹 Publication Date: Published on Dec 31, 2025
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.25075
• PDF: https://arxiv.org/pdf/2512.25075
• Project Page: https://zheninghuang.github.io/Space-Time-Pilot/
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#VideoDiffusion #GenerativeAI #DynamicScenes #ComputerGraphics #DeepLearning
✨Guiding a Diffusion Transformer with the Internal Dynamics of Itself
📝 Summary:
This paper introduces Internal Guidance IG for diffusion models, which adds auxiliary supervision to intermediate layers during training and extrapolates outputs during sampling. This simple strategy significantly improves training efficiency and generation quality. IG achieves state-of-the-art F...
🔹 Publication Date: Published on Dec 30, 2025
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.24176
• PDF: https://arxiv.org/pdf/2512.24176
• Project Page: https://zhouxingyu13.github.io/Internal-Guidance/
• Github: https://github.com/CVL-UESTC/Internal-Guidance
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#DiffusionModels #AI #DeepLearning #GenerativeAI #ComputerVision
📝 Summary:
This paper introduces Internal Guidance IG for diffusion models, which adds auxiliary supervision to intermediate layers during training and extrapolates outputs during sampling. This simple strategy significantly improves training efficiency and generation quality. IG achieves state-of-the-art F...
🔹 Publication Date: Published on Dec 30, 2025
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.24176
• PDF: https://arxiv.org/pdf/2512.24176
• Project Page: https://zhouxingyu13.github.io/Internal-Guidance/
• Github: https://github.com/CVL-UESTC/Internal-Guidance
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#DiffusionModels #AI #DeepLearning #GenerativeAI #ComputerVision
✨FlowBlending: Stage-Aware Multi-Model Sampling for Fast and High-Fidelity Video Generation
📝 Summary:
FlowBlending optimizes video generation by adapting model capacity to each stage. It uses large models for critical early and late timesteps, and small models for intermediate ones. This achieves faster inference and fewer FLOPs with no loss in large model fidelity.
🔹 Publication Date: Published on Dec 31, 2025
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.24724
• PDF: https://arxiv.org/pdf/2512.24724
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#VideoGeneration #GenerativeAI #DeepLearning #AIResearch #ModelOptimization
📝 Summary:
FlowBlending optimizes video generation by adapting model capacity to each stage. It uses large models for critical early and late timesteps, and small models for intermediate ones. This achieves faster inference and fewer FLOPs with no loss in large model fidelity.
🔹 Publication Date: Published on Dec 31, 2025
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.24724
• PDF: https://arxiv.org/pdf/2512.24724
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#VideoGeneration #GenerativeAI #DeepLearning #AIResearch #ModelOptimization
This media is not supported in your browser
VIEW IN TELEGRAM
✨Avatar Forcing: Real-Time Interactive Head Avatar Generation for Natural Conversation
📝 Summary:
Avatar Forcing creates real-time interactive talking head avatars. It uses diffusion forcing for low-latency reactions to user input and a label-free preference optimization for expressive, preferred motion, achieving 6.8x speedup.
🔹 Publication Date: Published on Jan 2
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2601.00664
• PDF: https://arxiv.org/pdf/2601.00664
• Project Page: https://taekyungki.github.io/AvatarForcing/
• Github: https://github.com/TaekyungKi/AvatarForcing
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#AvatarGeneration #RealTimeAI #GenerativeAI #ComputerVision #AIResearch
📝 Summary:
Avatar Forcing creates real-time interactive talking head avatars. It uses diffusion forcing for low-latency reactions to user input and a label-free preference optimization for expressive, preferred motion, achieving 6.8x speedup.
🔹 Publication Date: Published on Jan 2
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2601.00664
• PDF: https://arxiv.org/pdf/2601.00664
• Project Page: https://taekyungki.github.io/AvatarForcing/
• Github: https://github.com/TaekyungKi/AvatarForcing
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#AvatarGeneration #RealTimeAI #GenerativeAI #ComputerVision #AIResearch
✨Taming Hallucinations: Boosting MLLMs' Video Understanding via Counterfactual Video Generation
📝 Summary:
MLLMs struggle with hallucinations on counterfactual videos. DualityForge synthesizes counterfactual video data and QA pairs through diffusion-based editing to address this. This method significantly reduces model hallucinations and improves general performance.
🔹 Publication Date: Published on Dec 30, 2025
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.24271
• PDF: https://arxiv.org/pdf/2512.24271
• Project Page: https://amap-ml.github.io/Taming-Hallucinations/
• Github: https://github.com/AMAP-ML/Taming-Hallucinations
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#MLLMs #VideoUnderstanding #AIHallucinations #GenerativeAI #MachineLearning
📝 Summary:
MLLMs struggle with hallucinations on counterfactual videos. DualityForge synthesizes counterfactual video data and QA pairs through diffusion-based editing to address this. This method significantly reduces model hallucinations and improves general performance.
🔹 Publication Date: Published on Dec 30, 2025
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.24271
• PDF: https://arxiv.org/pdf/2512.24271
• Project Page: https://amap-ml.github.io/Taming-Hallucinations/
• Github: https://github.com/AMAP-ML/Taming-Hallucinations
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#MLLMs #VideoUnderstanding #AIHallucinations #GenerativeAI #MachineLearning