ML Research Hub
32.3K subscribers
6.73K photos
472 videos
24 files
7.34K links
Advancing research in Machine Learning – practical insights, tools, and techniques for researchers.

Admin: @HusseinSheikho || @Hussein_Sheikho
Download Telegram
PoseDreamer: Scalable and Photorealistic Human Data Generation Pipeline with Diffusion Models

📝 Summary:
PoseDreamer uses diffusion models to generate large-scale, photorealistic synthetic 3D human mesh datasets with improved image quality. Models trained on this data achieve comparable or superior performance to those using real or traditional synthetic datasets, offering a scalable solution.

🔹 Publication Date: Published on Mar 30

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.28763
• PDF: https://arxiv.org/pdf/2603.28763
• Project Page: https://prosperolo.github.io/posedreamer

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#DiffusionModels #SyntheticData #3DGeneration #ComputerVision #AIResearch
1
This media is not supported in your browser
VIEW IN TELEGRAM
VOID: Video Object and Interaction Deletion

📝 Summary:
VOID is a video object removal framework designed for complex scenarios involving significant object interactions. It uses vision-language and video diffusion models, leveraging causal reasoning to generate physically plausible counterfactual scenes. VOID better preserves consistent scene dynamic...

🔹 Publication Date: Published on Apr 2

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2604.02296
• PDF: https://arxiv.org/pdf/2604.02296
• Project Page: https://void-model.github.io/
• Github: https://github.com/Netflix/void-model

🔹 Models citing this paper:
https://huggingface.co/netflix/void-model

Spaces citing this paper:
https://huggingface.co/spaces/sam-motamed/VOID

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#VideoEditing #DiffusionModels #ComputerVision #GenerativeAI #DeepLearning
RefineAnything: Multimodal Region-Specific Refinement for Perfect Local Details

📝 Summary:
RefineAnything is a multimodal diffusion model for region-specific image refinement. It fixes local detail collapse while strictly preserving backgrounds using a Focus-and-Refine strategy and boundary-aware loss. This provides a practical solution for high-precision local editing.

🔹 Publication Date: Published on Apr 8

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2604.06870
• PDF: https://arxiv.org/pdf/2604.06870
• Project Page: https://limuloo.github.io/RefineAnything/
• Github: https://github.com/limuloo/RefineAnything

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#DiffusionModels #ImageEditing #ComputerVision #DeepLearning #GenerativeAI
Media is too big
VIEW IN TELEGRAM
Matrix-Game 3.0: Real-Time and Streaming Interactive World Model with Long-Horizon Memory

📝 Summary:
Matrix-Game 3.0 is a memory-augmented diffusion model achieving real-time 720p interactive video generation with long-term temporal consistency. It uses an advanced data engine, a self-correction training framework with memory, and efficient inference strategies. This enables practical, industria...

🔹 Publication Date: Published on Apr 10

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2604.08995
• PDF: https://arxiv.org/pdf/2604.08995
• Project Page: https://matrix-game-v3.github.io/

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#DiffusionModels #VideoGeneration #RealTimeAI #GenerativeAI #MachineLearning
CT-1: Vision-Language-Camera Models Transfer Spatial Reasoning Knowledge to Camera-Controllable Video Generation

📝 Summary:
CT-1 is a Vision-Language-Camera model that improves camera-controllable video generation. It uses a Diffusion Transformer and Wavelet Regularization Loss to accurately estimate camera trajectories, enabling precise video synthesis. This achieves 25.7% better accuracy than prior methods.

🔹 Publication Date: Published on Apr 10

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2604.09201
• PDF: https://arxiv.org/pdf/2604.09201
• Project Page: https://gulucaptain.github.io/Camera-Transformer-1/
• Github: https://github.com/gulucaptain/Camera-Transformer-1

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#AI #VideoGeneration #ComputerVision #DiffusionModels #VisionLanguageModels
MixFlow: Mixed Source Distributions Improve Rectified Flows

📝 Summary:
Rectified flows and diffusion models are improved through κ-FC formulation that conditions the source distribution and MixFlow training strategy that reduces generative path curvatures and enhances sa...

🔹 Publication Date: Published on Apr 10

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2604.09181
• PDF: https://arxiv.org/pdf/2604.09181
• Github: https://github.com/NazirNayal8/MixFlow

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#RectifiedFlows #DiffusionModels #GenerativeAI #MachineLearning #AIResearch
Uni-ViGU: Towards Unified Video Generation and Understanding via A Diffusion-Based Video Generator

📝 Summary:
Uni-ViGU introduces a unified framework for video generation and understanding, uniquely building upon a video generator as its foundation. It uses unified flow matching and a bidirectional training mechanism to achieve competitive performance in both generation and understanding tasks.

🔹 Publication Date: Published on Apr 9

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2604.08121
• PDF: https://arxiv.org/pdf/2604.08121
• Project Page: https://fr0zencrane.github.io/uni-vigu-page/
• Github: https://fr0zencrane.github.io/uni-vigu-page/

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#VideoGeneration #VideoUnderstanding #DiffusionModels #AIResearch #DeepLearning
Domain-Specific Latent Representations Improve the Fidelity of Diffusion-Based Medical Image Super-Resolution

📝 Summary:
Domain-specific autoencoders significantly enhance medical image super-resolution. Replacing generic VAEs improves fidelity, showing autoencoder choice is key, not the diffusion architecture. Autoencoder performance predicts overall SR quality.

🔹 Publication Date: Published on Apr 14

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2604.12152
• PDF: https://arxiv.org/pdf/2604.12152
• Github: https://github.com/sebasmos/latent-sr

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#MedicalImaging #SuperResolution #DiffusionModels #DeepLearning #Autoencoders
This media is not supported in your browser
VIEW IN TELEGRAM
Repurposing 3D Generative Model for Autoregressive Layout Generation

📝 Summary:
LaviGen is a 3D layout generation framework that repurposes 3D generative models. It uses an adapted 3D diffusion model for autoregressive generation, explicitly modeling geometric relations and physical constraints. This achieves superior, more plausible 3D layouts 65% faster than previous methods.

🔹 Publication Date: Published on Apr 17

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2604.16299
• PDF: https://arxiv.org/pdf/2604.16299
• Project Page: https://fenghora.github.io/LaviGen-Page/
• Github: https://github.com/fenghora/LaviGen

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#3DGeneration #DiffusionModels #GenerativeAI #ComputerGraphics #DeepLearning
Media is too big
VIEW IN TELEGRAM
Hierarchical Codec Diffusion for Video-to-Speech Generation

📝 Summary:
HiCoDiT generates speech from videos by leveraging the hierarchical structure of discrete speech tokens, achieving better audio-visual alignment through coarse-to-fine conditioning with dual-scale nor...

🔹 Publication Date: Published on Apr 17

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2604.15923
• PDF: https://arxiv.org/pdf/2604.15923

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#VideoToSpeech #DiffusionModels #GenerativeAI #SpeechSynthesis #DeepLearning
UDM-GRPO: Stable and Efficient Group Relative Policy Optimization for Uniform Discrete Diffusion Models

📝 Summary:
UDM-GRPO integrates Uniform Discrete Diffusion Models with reinforcement learning, solving training instability issues. It optimizes using final samples as actions and reconstructed trajectories. This achieves state-of-the-art performance in text-to-image generation and OCR tasks.

🔹 Publication Date: Published on Apr 20

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2604.18518
• PDF: https://arxiv.org/pdf/2604.18518
• Project Page: https://yovecent.github.io/UDM-GRPO.github.io/
• Github: https://github.com/Yovecent/UDM-GRPO

🔹 Models citing this paper:
https://huggingface.co/Yovecents/URSA-1.7B-IBQ512-UDMGRPO-GenEval
https://huggingface.co/Yovecents/URSA-1.7B-IBQ512-UDMGRPO-PickScore

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#DiffusionModels #ReinforcementLearning #GenerativeAI #TextToImage #DeepLearning
1