ML Research Hub – Telegram

ML Research Hub

32.6K subscribers

3.77K photos

193 videos

23 files

4.05K links

Advancing research in Machine Learning – practical insights, tools, and techniques for researchers.

Admin: @HusseinSheikho || @Hussein_Sheikho

Download Telegram

About

Blog

Apps

Platform

ML Research Hub

32.6K subscribers

ML Research Hub

✨YingVideo-MV: Music-Driven Multi-Stage Video Generation

📝 Summary:
YingVideo-MV is the first framework to generate high-quality, music-driven long performance videos with synchronized camera motion. It uses audio analysis, diffusion transformers, and a camera adapter, achieving precise music-motion-camera synchronization.

🔹 Publication Date: Published on Dec 2

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.02492
• PDF: https://arxiv.org/pdf/2512.02492

==================================

For more data science resources:
✓ https://t.iss.one/DataScienceT

#VideoGeneration #MusicAI #GenerativeAI #DiffusionModels #ComputerVision

122 views04:04

✨ Explore Data Science 📝 Write your paper

ML Research Hub

This media is not supported in your browser

VIEW IN TELEGRAM

✨BlockVid: Block Diffusion for High-Quality and Consistent Minute-Long Video Generation

📝 Summary:
BlockVid introduces a block diffusion framework for high-quality, coherent minute-long video generation. It overcomes error accumulation via a semantic-aware sparse KV cache, Block Forcing training, and dedicated noise scheduling. BlockVid outperforms existing methods and proposes LV-Bench, a new...

🔹 Publication Date: Published on Nov 28

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.22973
• PDF: https://arxiv.org/pdf/2511.22973
• Project Page: https://ziplab.co/BlockVid/
• Github: https://github.com/alibaba-damo-academy/Inferix/

==================================

For more data science resources:
✓ https://t.iss.one/DataScienceT

#VideoGeneration #DiffusionModels #GenerativeAI #DeepLearning #ComputerVision

206 views05:05

✨ Explore Data Science 📝 Write your paper

ML Research Hub

✨MultiShotMaster: A Controllable Multi-Shot Video Generation Framework

📝 Summary:
MultiShotMaster is a framework for controllable multi-shot video generation. It extends a single-shot model with novel RoPE variants for flexible shot arrangement, narrative order, and spatiotemporal reference injection. The framework also uses an automated data annotation pipeline to address dat...

🔹 Publication Date: Published on Dec 2

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.03041
• PDF: https://arxiv.org/pdf/2512.03041
• Project Page: https://qinghew.github.io/MultiShotMaster/

==================================

For more data science resources:
✓ https://t.iss.one/DataScienceT

#VideoGeneration #GenerativeAI #DeepLearning #AI #ComputerVision

249 views06:06

✨ Explore Data Science 📝 Write your paper

ML Research Hub

✨PSA: Pyramid Sparse Attention for Efficient Video Understanding and Generation

📝 Summary:
Pyramid Sparse Attention PSA introduces multi-level pooled key-value representations to overcome information loss in traditional sparse attention. It dynamically retains critical information, improving efficiency and performance for video understanding and generation tasks.

🔹 Publication Date: Published on Dec 3

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.04025
• PDF: https://arxiv.org/pdf/2512.04025
• Project Page: https://ziplab.co/PSA/
• Github: https://github.com/ziplab/Pyramid-Sparse-Attention

==================================

For more data science resources:
✓ https://t.iss.one/DataScienceT

#SparseAttention #VideoUnderstanding #VideoGeneration #DeepLearning #ComputerVision

250 views22:39

✨ Explore Data Science 📝 Write your paper

ML Research Hub

✨Reward Forcing: Efficient Streaming Video Generation with Rewarded Distribution Matching Distillation

📝 Summary:
Reward Forcing improves streaming video generation by using EMA-Sink to update context tokens, preventing static initial frames. It also introduces Rewarded Distribution Matching Distillation to prioritize dynamic content, enhancing motion quality and achieving state-of-the-art performance.

🔹 Publication Date: Published on Dec 4

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.04678
• PDF: https://arxiv.org/pdf/2512.04678
• Project Page: https://reward-forcing.github.io/
• Github: https://reward-forcing.github.io/

🔹 Models citing this paper:
• https://huggingface.co/JaydenLu666/Reward-Forcing-T2V-1.3B

==================================

For more data science resources:
✓ https://t.iss.one/DataScienceT

#VideoGeneration #GenerativeAI #DeepLearning #ComputerVision #AIResearch

137 views03:01

✨ Explore Data Science 📝 Write your paper

ML Research Hub

✨TV2TV: A Unified Framework for Interleaved Language and Video Generation

📝 Summary:
TV2TV is a unified framework for interleaved language and video generation, using a Mixture-of-Transformers. It learns to 'think in words' before 'acting in pixels,' enhancing visual quality, controllability, and prompt alignment. The model shows strong performance on video game and natural video...

🔹 Publication Date: Published on Dec 4

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.05103
• PDF: https://arxiv.org/pdf/2512.05103

==================================

For more data science resources:
✓ https://t.iss.one/DataScienceT

#VideoGeneration #GenerativeAI #MultimodalAI #Transformers #AI

103 views03:02

✨ Explore Data Science 📝 Write your paper

ML Research Hub

Media is too big

VIEW IN TELEGRAM

✨Stable Video Infinity: Infinite-Length Video Generation with Error Recycling

📝 Summary:
Stable Video Infinity SVI generates infinite-length videos with high consistency and controllable stories. It introduces Error-Recycling Fine-Tuning, teaching the Diffusion Transformer to correct its self-generated errors and address the training-test discrepancy.

🔹 Publication Date: Published on Oct 10

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2510.09212
• PDF: https://arxiv.org/pdf/2510.09212
• Project Page: https://stable-video-infinity.github.io/homepage/
• Github: https://github.com/vita-epfl/Stable-Video-Infinity

🔹 Models citing this paper:
• https://huggingface.co/vita-video-gen/svi-model

✨ Datasets citing this paper:
• https://huggingface.co/datasets/vita-video-gen/svi-benchmark

==================================

For more data science resources:
✓ https://t.iss.one/DataScienceT

#VideoGeneration #AI #DiffusionModels #DeepLearning #ComputerVision

144 views03:03

✨ Explore Data Science 📝 Write your paper

ML Research Hub

✨BulletTime: Decoupled Control of Time and Camera Pose for Video Generation

📝 Summary:
This paper presents a video diffusion framework that decouples scene dynamics from camera pose. This enables precise 4D control over time and viewpoint for high-quality video generation, outperforming prior models in controllability.

🔹 Publication Date: Published on Dec 4

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.05076
• PDF: https://arxiv.org/pdf/2512.05076

==================================

For more data science resources:
✓ https://t.iss.one/DataScienceT

#VideoGeneration #DiffusionModels #GenerativeAI #ComputerVision #AICameraControl

139 views04:04

✨ Explore Data Science 📝 Write your paper

ML Research Hub

This media is not supported in your browser

VIEW IN TELEGRAM

✨EgoLCD: Egocentric Video Generation with Long Context Diffusion

📝 Summary:
EgoLCD addresses content drift in long egocentric video generation by integrating long-term sparse and attention-based short-term memory with narrative prompting. It achieves state-of-the-art perceptual quality and temporal consistency, mitigating generative forgetting.

🔹 Publication Date: Published on Dec 4

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.04515
• PDF: https://arxiv.org/pdf/2512.04515
• Project Page: https://aigeeksgroup.github.io/EgoLCD/
• Github: https://github.com/AIGeeksGroup/EgoLCD

==================================

For more data science resources:
✓ https://t.iss.one/DataScienceT

#AI #VideoGeneration #DiffusionModels #ComputerVision #EgocentricVision

👍1

143 views06:05

✨ Explore Data Science 📝 Write your paper

ML Research Hub

✨Generative Action Tell-Tales: Assessing Human Motion in Synthesized Videos

📝 Summary:
A new metric evaluates human action in generated videos by using a learned latent space of real-world actions, fusing skeletal geometry and appearance features. It significantly improves temporal and visual correctness assessment, outperforming existing methods and correlating better with human p...

🔹 Publication Date: Published on Dec 1

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.01803
• PDF: https://arxiv.org/pdf/2512.01803
• Project Page: https://xthomasbu.github.io/video-gen-evals/
• Github: https://xthomasbu.github.io/video-gen-evals/

✨ Datasets citing this paper:
• https://huggingface.co/datasets/dghadiya/TAG-Bench-Video

==================================

For more data science resources:
✓ https://t.iss.one/DataScienceT

#VideoGeneration #HumanMotion #ComputerVision #AIMetrics #DeepLearning

274 views14:09

✨ Explore Data Science 📝 Write your paper

ML Research Hub

✨Deep Forcing: Training-Free Long Video Generation with Deep Sink and Participative Compression

📝 Summary:
Deep Forcing is a training-free method that enhances real-time video diffusion for high-quality, long-duration generation. It uses Deep Sink for stable context and Participative Compression for efficient KV cache pruning, achieving over 12x extrapolation and improved consistency.

🔹 Publication Date: Published on Dec 4

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.05081
• PDF: https://arxiv.org/pdf/2512.05081
• Github: https://cvlab-kaist.github.io/DeepForcing/

==================================

For more data science resources:
✓ https://t.iss.one/DataScienceT

#VideoGeneration #DiffusionModels #TrainingFreeAI #DeepLearning #ComputerVision

❤2

326 views16:10

✨ Explore Data Science 📝 Write your paper

ML Research Hub

Media is too big

VIEW IN TELEGRAM

✨Light-X: Generative 4D Video Rendering with Camera and Illumination Control

📝 Summary:
Light-X is a video generation framework for controllable rendering from monocular videos with joint viewpoint and illumination control. It disentangles geometry and lighting using synthetic data for robust training, outperforming prior methods in both aspects.

🔹 Publication Date: Published on Dec 4

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.05115
• PDF: https://arxiv.org/pdf/2512.05115
• Project Page: https://lightx-ai.github.io/
• Github: https://github.com/TQTQliu/Light-X

==================================

For more data science resources:
✓ https://t.iss.one/DataScienceT

#VideoGeneration #ComputerVision #AI #NeuralRendering #GenerativeAI

❤1

525 views21:11

✨ Explore Data Science 📝 Write your paper

ML Research Hub

✨ProPhy: Progressive Physical Alignment for Dynamic World Simulation

📝 Summary:
ProPhy is a two-stage framework that enhances video generation by explicitly incorporating physics-aware conditioning and anisotropic generation. It uses a Mixture-of-Physics-Experts mechanism to extract fine-grained physical priors, improving physical consistency and realism in dynamic world sim...

🔹 Publication Date: Published on Dec 5

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.05564
• PDF: https://arxiv.org/pdf/2512.05564

==================================

For more data science resources:
✓ https://t.iss.one/DataScienceT

#VideoGeneration #PhysicsAI #DynamicSimulation #DeepLearning #ComputerVision

209 views03:01

✨ Explore Data Science 📝 Write your paper

ML Research Hub

✨UnityVideo: Unified Multi-Modal Multi-Task Learning for Enhancing World-Aware Video Generation

📝 Summary:
UnityVideo is a unified framework enhancing video generation by integrating multiple modalities and training paradigms. It uses dynamic noising and a modality switcher for comprehensive world understanding. This improves video quality, consistency, and zero-shot generalization to new data.

🔹 Publication Date: Published on Dec 8

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.07831
• PDF: https://arxiv.org/pdf/2512.07831
• Project Page: https://jackailab.github.io/Projects/UnityVideo/
• Github: https://github.com/dvlab-research/UnityVideo

==================================

For more data science resources:
✓ https://t.iss.one/DataScienceT

#VideoGeneration #MultimodalAI #GenerativeAI #DeepLearning #AIResearch

252 views08:04

✨ Explore Data Science 📝 Write your paper

ML Research Hub

✨MIND-V: Hierarchical Video Generation for Long-Horizon Robotic Manipulation with RL-based Physical Alignment

📝 Summary:
MIND-V generates long-horizon, physically plausible robotic manipulation videos. This hierarchical framework uses semantic reasoning and an RL-based physical alignment strategy to synthesize robust, coherent actions, addressing data scarcity.

🔹 Publication Date: Published on Dec 7

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.06628
• PDF: https://arxiv.org/pdf/2512.06628
• Project Page: https://github.com/Richard-Zhang-AI/MIND-V
• Github: https://github.com/Richard-Zhang-AI/MIND-V

==================================

For more data science resources:
✓ https://t.iss.one/DataScienceT

#Robotics #VideoGeneration #ReinforcementLearning #AI #MachineLearning

215 views09:03

✨ Explore Data Science 📝 Write your paper

ML Research Hub

Media is too big

VIEW IN TELEGRAM

✨OneStory: Coherent Multi-Shot Video Generation with Adaptive Memory

📝 Summary:
OneStory generates coherent multi-shot videos by modeling global cross-shot context. It uses a Frame Selection module and an Adaptive Conditioner for next-shot generation, leveraging pretrained models and a new dataset. This achieves state-of-the-art narrative coherence for long-form video storyt...

🔹 Publication Date: Published on Dec 8

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.07802
• PDF: https://arxiv.org/pdf/2512.07802
• Project Page: https://zhaochongan.github.io/projects/OneStory/

==================================

For more data science resources:
✓ https://t.iss.one/DataScienceT

#VideoGeneration #AI #DeepLearning #ComputerVision #GenerativeAI

❤1

226 views09:04

✨ Explore Data Science 📝 Write your paper

ML Research Hub

✨VideoSSM: Autoregressive Long Video Generation with Hybrid State-Space Memory

📝 Summary:
VideoSSM proposes a hybrid state-space memory model for long video generation. It unifies autoregressive diffusion with global state-space memory and local context to achieve state-of-the-art temporal consistency and motion stability. This enables scalable, interactive minute-scale video synthesis.

🔹 Publication Date: Published on Dec 4

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.04519
• PDF: https://arxiv.org/pdf/2512.04519

==================================

For more data science resources:
✓ https://t.iss.one/DataScienceT

#VideoGeneration #GenerativeAI #DiffusionModels #StateSpaceModels #DeepLearning

283 views08:03

✨ Explore Data Science 📝 Write your paper

ML Research Hub

✨GimbalDiffusion: Gravity-Aware Camera Control for Video Generation

📝 Summary:
GimbalDiffusion offers precise text-to-video camera control by using absolute, gravity-aligned coordinates. This framework defines interpretable camera trajectories, enhancing robustness and diverse motion beyond relative methods.

🔹 Publication Date: Published on Dec 9

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.09112
• PDF: https://arxiv.org/pdf/2512.09112
• Project Page: https://lvsn.github.io/GimbalDiffusion/

==================================

For more data science resources:
✓ https://t.iss.one/DataScienceT

#VideoGeneration #AI #DiffusionModels #ComputerVision #DeepLearning

301 views18:05

✨ Explore Data Science 📝 Write your paper

ML Research Hub

This media is not supported in your browser

VIEW IN TELEGRAM

✨Structure From Tracking: Distilling Structure-Preserving Motion for Video Generation

📝 Summary:
SAM2VideoX improves realistic video motion by distilling structure-preserving priors from a tracking model into a bidirectional diffusion model. It uses novel feature fusion and local alignment, achieving significant performance gains over prior methods.

🔹 Publication Date: Published on Dec 12

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.11792
• PDF: https://arxiv.org/pdf/2512.11792
• Project Page: https://sam2videox.github.io/

==================================

For more data science resources:
✓ https://t.iss.one/DataScienceT

#VideoGeneration #DiffusionModels #ComputerVision #DeepLearning #MotionTracking

173 views04:02

✨ Explore Data Science 📝 Write your paper

ML Research Hub

This media is not supported in your browser

VIEW IN TELEGRAM

✨EgoX: Egocentric Video Generation from a Single Exocentric Video

📝 Summary:
EgoX generates egocentric videos from single exocentric inputs. It uses video diffusion models with LoRA adaptation, unified conditioning, and geometry-guided self-attention for coherent and realistic results.

🔹 Publication Date: Published on Dec 9

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.08269
• PDF: https://arxiv.org/pdf/2512.08269
• Project Page: https://keh0t0.github.io/EgoX/
• Github: https://github.com/KEH0T0/EgoX

==================================

For more data science resources:
✓ https://t.iss.one/DataScienceT

#EgocentricVideo #VideoGeneration #DiffusionModels #ComputerVision #DeepLearning

❤1

215 views05:02

✨ Explore Data Science 📝 Write your paper