Data Science | Machine Learning with Python for Researchers
32.6K subscribers
3.31K photos
125 videos
23 files
3.52K links
ads: @HusseinSheikho

The Data Science and Python channel is for researchers and advanced programmers

Buy ads: https://telega.io/c/dataScienceT
Download Telegram
This media is not supported in your browser
VIEW IN TELEGRAM
🌼 SOTA Textured 3D-Guided VTON 🌼

👉 #ALIBABA unveils 3DV-TON, a novel diffusion model for HQ and temporally consistent video. Generating animatable textured 3D meshes as explicit frame-level guidance, alleviating the issue of models over-focusing on appearance fidelity at the expense of motion coherence. Code & benchmark to be released 💙

👉 Review: https://t.ly/0tjdC
👉 Paper: https://lnkd.in/dFseYSXz
👉 Project: https://lnkd.in/djtqzrzs
👉 Repo: TBA

#AI #3DReconstruction #DiffusionModels #VirtualTryOn #ComputerVision #DeepLearning #VideoSynthesis

https://t.iss.one/DataScienceT 🔗
Please open Telegram to view this post
VIEW IN TELEGRAM
2👍2
LongCat-Video Technical Report

📝 Summary:
LongCat-Video is a 13.6B Diffusion Transformer model excelling in efficient, high-quality long video generation. It uses a unified architecture for tasks like Text-to-Video and coarse-to-fine generation for efficiency. This model is a significant step toward developing world models.

🔹 Publication Date: Published on Oct 25

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2510.22200
• PDF: https://arxiv.org/pdf/2510.22200
• Github: https://github.com/meituan-longcat/LongCat-Video

🔹 Models citing this paper:
https://huggingface.co/meituan-longcat/LongCat-Video

Spaces citing this paper:
https://huggingface.co/spaces/multimodalart/LongCat-Video
https://huggingface.co/spaces/rahul7star/LongCat-Video
https://huggingface.co/spaces/armaishere/meituan-longcat-LongCat-Video

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#VideoGeneration #DiffusionModels #Transformers #AI #TextToVideo
Skyfall-GS: Synthesizing Immersive 3D Urban Scenes from Satellite Imagery

📝 Summary:
Skyfall-GS synthesizes large-scale, explorable 3D urban scenes by combining satellite imagery for geometry and diffusion models for realistic textures. This framework offers improved cross-view consistent geometry and photorealistic appearances without needing costly 3D annotations.

🔹 Publication Date: Published on Oct 17

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2510.15869
• PDF: https://arxiv.org/pdf/2510.15869
• Project Page: https://skyfall-gs.jayinnn.dev/
• Github: https://github.com/jayin92/skyfall-gs

🔹 Models citing this paper:
https://huggingface.co/jayinnn/Skyfall-GS-ply

Datasets citing this paper:
https://huggingface.co/datasets/jayinnn/Skyfall-GS-eval
https://huggingface.co/datasets/jayinnn/Skyfall-GS-datasets

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#3DReconstruction #ComputerVision #SatelliteImagery #DiffusionModels #UrbanModeling
DyPE: Dynamic Position Extrapolation for Ultra High Resolution Diffusion

📝 Summary:
DyPE enhances diffusion transformers for ultra-high-resolution image generation by dynamically adjusting positional encodings. This training-free method allows pre-trained models to synthesize images far beyond their training resolution, achieving state-of-the-art fidelity without extra sampling ...

🔹 Publication Date: Published on Oct 23

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2510.20766
• PDF: https://arxiv.org/pdf/2510.20766
• Project Page: https://noamissachar.github.io/DyPE/
• Github: https://github.com/guyyariv/DyPE

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#DiffusionModels #ImageGeneration #HighResolution #DeepLearning #ComputerVision
Diffusion Language Models are Super Data Learners

📝 Summary:
Diffusion Language Models DLMs consistently outperform autoregressive models, especially in low-data settings. This is due to any-order modeling, iterative bidirectional denoising, and Monte Carlo augmentation. DLMs maintain advantages at scale, achieving strong performance even by repeating limi...

🔹 Publication Date: Published on Nov 5

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.03276
• PDF: https://arxiv.org/pdf/2511.03276
• Project Page: https://github.com/JinjieNi/dlms-are-super-data-learners
• Github: https://github.com/JinjieNi/OpenMoE2

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#DiffusionModels #LanguageModels #MachineLearning #LowDataLearning #AI
UniAVGen: Unified Audio and Video Generation with Asymmetric Cross-Modal Interactions

📝 Summary:
UniAVGen uses dual Diffusion Transformers and Asymmetric Cross-Modal Interaction for unified audio-video generation. This framework ensures precise spatiotemporal synchronization and semantic consistency. It outperforms existing methods in sync and consistency with far fewer training samples.

🔹 Publication Date: Published on Nov 5

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.03334
• PDF: https://arxiv.org/pdf/2511.03334
• Project Page: https://mcg-nju.github.io/UniAVGen/
• Github: https://mcg-nju.github.io/UniAVGen/

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#GenerativeAI #AudioVideoGeneration #DiffusionModels #CrossModalAI #DeepLearning
EVODiff: Entropy-aware Variance Optimized Diffusion Inference

📝 Summary:
EVODiff optimizes diffusion model inference using an entropy-aware variance method. It leverages information theory to reduce uncertainty and minimize errors. This approach significantly outperforms gradient-based solvers, enhancing efficiency and reconstruction quality.

🔹 Publication Date: Published on Sep 30

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2509.26096
• PDF: https://arxiv.org/pdf/2509.26096
• Project Page: https://neurips.cc/virtual/2025/poster/115792
• Github: https://github.com/ShiguiLi/EVODiff

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#DiffusionModels #DeepLearning #MachineLearning #Optimization #InformationTheory
1
Diffusion-SDPO: Safeguarded Direct Preference Optimization for Diffusion Models

📝 Summary:
Diffusion-SDPO improves text-to-image quality by fixing a flaw in standard DPO where preferred output error can increase. It uses a safeguarded update to adaptively scale the loser gradient, ensuring the preferred output's error never increases. This leads to consistent quality gains across bench...

🔹 Publication Date: Published on Nov 5

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.03317
• PDF: https://arxiv.org/pdf/2511.03317
• Github: https://github.com/AIDC-AI/Diffusion-SDPO

🔹 Models citing this paper:
https://huggingface.co/AIDC-AI/Diffusion-SDPO

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#DiffusionModels #DPO #TextToImage #GenerativeAI #AI
Generating an Image From 1,000 Words: Enhancing Text-to-Image With Structured Captions

📝 Summary:
This paper introduces FIBO, a text-to-image model trained on long structured captions to enhance prompt alignment and controllability. It proposes DimFusion for efficient processing and the TaBR evaluation protocol, achieving state-of-the-art results.

🔹 Publication Date: Published on Nov 10

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.06876
• PDF: https://arxiv.org/pdf/2511.06876

🔹 Models citing this paper:
https://huggingface.co/briaai/FIBO

Spaces citing this paper:
https://huggingface.co/spaces/galdavidi/FIBO-Mashup
https://huggingface.co/spaces/briaai/FIBO
https://huggingface.co/spaces/briaai/Fibo-local

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#TextToImage #GenerativeAI #DiffusionModels #AI #MachineLearning
KLASS: KL-Guided Fast Inference in Masked Diffusion Models

📝 Summary:
KLASS accelerates masked diffusion model inference by using KL divergence to identify stable, high-confidence predictions. It unmasks multiple tokens per iteration, significantly speeding up generation and improving quality across text, image, and molecular tasks.

🔹 Publication Date: Published on Nov 7

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.05664
• PDF: https://arxiv.org/pdf/2511.05664
• Github: https://github.com/shkim0116/KLASS

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#DiffusionModels #GenerativeAI #MachineLearning #AIResearch #ModelAcceleration
1
FlashVSR: Towards Real-Time Diffusion-Based Streaming Video Super-Resolution

📝 Summary:
FlashVSR introduces the first real-time, one-step streaming diffusion framework for video super-resolution. It addresses high latency and computation through innovations like distillation, sparse attention, and a tiny decoder. FlashVSR achieves state-of-the-art performance with up to 12x speedup.

🔹 Publication Date: Published on Oct 14

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2510.12747
• PDF: https://arxiv.org/pdf/2510.12747
• Project Page: https://zhuang2002.github.io/FlashVSR/
• Github: https://github.com/OpenImagingLab/FlashVSR

🔹 Models citing this paper:
https://huggingface.co/JunhaoZhuang/FlashVSR
https://huggingface.co/JunhaoZhuang/FlashVSR-v1.1

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#FlashVSR #VideoSuperResolution #RealTimeAI #DiffusionModels #ComputerVision
🔥1
TiDAR: Think in Diffusion, Talk in Autoregression

📝 Summary:
TiDAR is a hybrid diffusion-autoregressive model achieving high throughput and AR-level quality. It drafts tokens with diffusion and samples autoregressively in a single pass, outperforming existing methods and delivering 4.71x to 5.91x faster generation.

🔹 Publication Date: Published on Nov 12

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.08923
• PDF: https://arxiv.org/pdf/2511.08923

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#AI #MachineLearning #DiffusionModels #AutoregressiveModels #GenerativeAI
This media is not supported in your browser
VIEW IN TELEGRAM
Time-to-Move: Training-Free Motion Controlled Video Generation via Dual-Clock Denoising

📝 Summary:
Time-to-Move TTM is a training-free framework for precise motion and appearance controlled video generation using I2V diffusion models. It employs crude reference animations as motion cues and introduces dual-clock denoising for flexible alignment, outperforming training-based methods.

🔹 Publication Date: Published on Nov 9

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.08633
• PDF: https://arxiv.org/pdf/2511.08633
• Project Page: https://time-to-move.github.io/
• Github: https://github.com/time-to-move/TTM

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#VideoGeneration #DiffusionModels #GenerativeAI #MotionControl #ComputerVision
Toward the Frontiers of Reliable Diffusion Sampling via Adversarial Sinkhorn Attention Guidance

📝 Summary:
ASAG is a novel diffusion guidance method that uses optimal transport and the Sinkhorn algorithm to adversarially disrupt attention scores. It weakens misleading attention alignments by injecting an adversarial cost, improving sample quality, controllability, and fidelity without model retraining.

🔹 Publication Date: Published on Nov 10

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.07499
• PDF: https://arxiv.org/pdf/2511.07499

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#DiffusionModels #AdversarialAI #OptimalTransport #GenerativeAI #DeepLearning
One Small Step in Latent, One Giant Leap for Pixels: Fast Latent Upscale Adapter for Your Diffusion Models

📝 Summary:
LUA performs efficient super-resolution directly in diffusion models' latent space. This lightweight module enables faster, high-quality image synthesis by upscaling before VAE decoding, cutting time versus pixel-space methods, and generalizing across VAEs.

🔹 Publication Date: Published on Nov 13

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.10629
• PDF: https://arxiv.org/pdf/2511.10629

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#DiffusionModels #SuperResolution #LatentSpace #ImageGeneration #AIResearch
LiteAttention: A Temporal Sparse Attention for Diffusion Transformers

📝 Summary:
LiteAttention accelerates video generation by exploiting temporal coherence in diffusion attention. It propagates skip decisions for non-essential attention tiles across denoising steps, eliminating redundant computations. This achieves substantial speedups without quality loss.

🔹 Publication Date: Published on Nov 14

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.11062
• PDF: https://arxiv.org/pdf/2511.11062
• Github: https://github.com/moonmath-ai/LiteAttention

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#DiffusionModels #VideoGeneration #Transformers #SparseAttention #ComputationalEfficiency
This media is not supported in your browser
VIEW IN TELEGRAM
MMaDA-Parallel: Multimodal Large Diffusion Language Models for Thinking-Aware Editing and Generation

📝 Summary:
A parallel multimodal diffusion framework, MMaDA-Parallel, enhances cross-modal alignment and semantic consistency in thinking-aware image synthesis by addressing error propagation issues in sequentia...

🔹 Publication Date: Published on Nov 12

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.09611
• PDF: https://arxiv.org/pdf/2511.09611
• Project Page: https://tyfeld.github.io/mmadaparellel.github.io/
• Github: https://github.com/tyfeld/MMaDA-Parallel

🔹 Models citing this paper:
https://huggingface.co/tyfeld/MMaDA-Parallel-A
https://huggingface.co/tyfeld/MMaDA-Parallel-M

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#MultimodalAI #DiffusionModels #ImageSynthesis #LLM #AIResearch
Back to Basics: Let Denoising Generative Models Denoise

📝 Summary:
Denoising diffusion models should predict clean images directly, not noise, leveraging the data manifold assumption. The paper introduces JiT, a model using simple, large-patch Transformers that achieves competitive generative results on ImageNet.

🔹 Publication Date: Published on Nov 17

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.13720
• PDF: https://arxiv.org/pdf/2511.13720
• Github: https://github.com/LTH14/JiT

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#DiffusionModels #GenerativeAI #DeepLearning #ComputerVision #AIResearch
1
A Style is Worth One Code: Unlocking Code-to-Style Image Generation with Discrete Style Space

📝 Summary:
CoTyle introduces code-to-style image generation, creating consistent visual styles from numerical codes. It is the first open-source academic method for this task, using a discrete style codebook and a text-to-image diffusion model for diverse, reproducible styles.

🔹 Publication Date: Published on Nov 13

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.10555
• PDF: https://arxiv.org/pdf/2511.10555
• Project Page: https://Kwai-Kolors.github.io/CoTyle/
• Github: https://github.com/Kwai-Kolors/CoTyle

Spaces citing this paper:
https://huggingface.co/spaces/Kwai-Kolors/CoTyle

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#ImageGeneration #DiffusionModels #NeuralStyle #ComputerVision #DeepLearning
Mixture of States: Routing Token-Level Dynamics for Multimodal Generation

📝 Summary:
MoS is a novel multimodal diffusion model that uses a learnable token-wise router for flexible state-based modality interactions. This achieves state-of-the-art text-to-image generation and editing with minimal parameters and computational overhead.

🔹 Publication Date: Published on Nov 15

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.12207
• PDF: https://arxiv.org/pdf/2511.12207

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#GenerativeAI #MultimodalAI #DiffusionModels #TextToImage #DeepLearning