ML Research Hub

✨FlashVSR: Towards Real-Time Diffusion-Based Streaming Video Super-Resolution

📝 Summary:
FlashVSR introduces the first real-time, one-step streaming diffusion framework for video super-resolution. It addresses high latency and computation through innovations like distillation, sparse attention, and a tiny decoder. FlashVSR achieves state-of-the-art performance with up to 12x speedup.

🔹 Publication Date: Published on Oct 14

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2510.12747
• PDF: https://arxiv.org/pdf/2510.12747
• Project Page: https://zhuang2002.github.io/FlashVSR/
• Github: https://github.com/OpenImagingLab/FlashVSR

🔹 Models citing this paper:
• https://huggingface.co/JunhaoZhuang/FlashVSR
• https://huggingface.co/JunhaoZhuang/FlashVSR-v1.1

==================================

For more data science resources:
✓ https://t.iss.one/DataScienceT

#FlashVSR #VideoSuperResolution #RealTimeAI #DiffusionModels #ComputerVision

🔥1

343 views11:04

✨ Explore Data Science 📝 Write your paper

ML Research Hub

✨Inferix: A Block-Diffusion based Next-Generation Inference Engine for World Simulation

📝 Summary:
Inferix is a next-gen inference engine for immersive world simulation, generating high-quality interactive videos. It uses semi-autoregressive block-diffusion with LLM-style KV Cache for efficient, stable generation, enabling real-time world dynamics.

🔹 Publication Date: Published on Nov 25

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.20714
• PDF: https://arxiv.org/pdf/2511.20714
• Github: https://github.com/alibaba-damo-academy/Inferix

==================================

For more data science resources:
✓ https://t.iss.one/DataScienceT

#WorldSimulation #DiffusionModels #GenerativeAI #AIResearch #RealtimeAI

344 views03:01

✨ Explore Data Science 📝 Write your paper

✨VLASH: Real-Time VLAs via Future-State-Aware Asynchronous Inference

📝 Summary:
VLASH is an asynchronous inference framework for VLAs. It achieves fast accurate and low-latency robotic control by estimating future robot states bridging prediction-execution gaps. This enables VLAs to perform high-precision tasks like ping-pong with significant speedup and reduced latency.

🔹 Publication Date: Published on Nov 30

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.01031
• PDF: https://arxiv.org/pdf/2512.01031
• Github: https://github.com/mit-han-lab/vlash

==================================

For more data science resources:
✓ https://t.iss.one/DataScienceT

#Robotics #VisionLanguageModels #RealTimeAI #AIResearch #MachineLearning

133 views04:03

✨ Explore Data Science 📝 Write your paper

ML Research Hub

0:05

This media is not supported in your browser

VIEW IN TELEGRAM

✨RELIC: Interactive Video World Model with Long-Horizon Memory

📝 Summary:
RELIC is a unified framework enabling real-time, memory-aware exploration of scenes with user control. It integrates long-horizon memory and spatial consistency using video-diffusion distillation, achieving 16 FPS generation with robust 3D coherence.

🔹 Publication Date: Published on Dec 3

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.04040
• PDF: https://arxiv.org/pdf/2512.04040
• Project Page: https://relic-worldmodel.github.io/

==================================

For more data science resources:
✓ https://t.iss.one/DataScienceT

#WorldModels #VideoDiffusion #DeepLearning #RealTimeAI #ComputerVision

159 views08:02

✨ Explore Data Science 📝 Write your paper

ML Research Hub

✨Live Avatar: Streaming Real-time Audio-Driven Avatar Generation with Infinite Length

📝 Summary:
Live Avatar uses a 14-billion-parameter diffusion model to achieve real-time, high-fidelity, infinite-length audio-driven avatar generation. It employs Timestep-forcing Pipeline Parallelism and Rolling Sink Frame Mechanism for efficiency and consistency, reaching 20 FPS on 5 H800 GPUs.

🔹 Publication Date: Published on Dec 4

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.04677
• PDF: https://arxiv.org/pdf/2512.04677

==================================

For more data science resources:
✓ https://t.iss.one/DataScienceT

#LiveAvatar #GenerativeAI #RealtimeAI #DiffusionModels #AvatarGeneration

206 views04:04

✨ Explore Data Science 📝 Write your paper

ML Research Hub

✨Real-Time Object Detection Meets DINOv3

📝 Summary:
DEIMv2 extends DEIM with DINOv3 features, achieving superior real-time object detection across GPU, edge, and mobile. It uses a Spatial Tuning Adapter and pruned HGNetv2 for diverse models, setting new state of the art with impressive performance-cost trade-offs.

🔹 Publication Date: Published on Sep 25

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2509.20787
• PDF: https://arxiv.org/pdf/2509.20787
• Project Page: https://intellindust-ai-lab.github.io/projects/DEIMv2/
• Github: https://github.com/Intellindust-AI-Lab/DEIMv2

==================================

For more data science resources:
✓ https://t.iss.one/DataScienceT

#ObjectDetection #RealTimeAI #ComputerVision #MachineLearning #EdgeAI

❤1

328 views08:01

✨ Explore Data Science 📝 Write your paper

ML Research Hub

0:07

This media is not supported in your browser

VIEW IN TELEGRAM

✨PersonaLive! Expressive Portrait Image Animation for Live Streaming

📝 Summary:
PersonaLive is a diffusion framework for real-time portrait animation, overcoming latency issues in live streaming. It uses multi-stage training, implicit signals for motion control, and appearance distillation for efficiency. This achieves state-of-the-art performance with up to 7-22x speedup ov...

🔹 Publication Date: Published on Dec 12

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.11253
• PDF: https://arxiv.org/pdf/2512.11253
• Github: https://github.com/GVCLab/PersonaLive

==================================

For more data science resources:
✓ https://t.iss.one/DataScienceT

#PortraitAnimation #LiveStreaming #DiffusionModels #RealtimeAI #ComputerVision

❤1

295 views03:02

✨ Explore Data Science 📝 Write your paper

ML Research Hub

✨Sharp Monocular View Synthesis in Less Than a Second

📝 Summary:
SHARP synthesizes photorealistic 3D views from a single image using a 3D Gaussian representation. It achieves state-of-the-art quality with rapid processing, taking less than a second, and supports metric camera movements.

🔹 Publication Date: Published on Dec 11

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.10685
• PDF: https://arxiv.org/pdf/2512.10685
• Project Page: https://apple.github.io/ml-sharp/
• Github: https://github.com/apple/ml-sharp

🔹 Models citing this paper:
• https://huggingface.co/apple/Sharp

✨ Spaces citing this paper:
• https://huggingface.co/spaces/ronedgecomb/ml-sharp

==================================

For more data science resources:
✓ https://t.iss.one/DataScienceT

#ViewSynthesis #3DVision #ComputerVision #RealtimeAI #GaussianSplats

❤1

387 views12:05

✨ Explore Data Science 📝 Write your paper

ML Research Hub

✨TimeBill: Time-Budgeted Inference for Large Language Models

📝 Summary:
TimeBill is a framework for LLMs in time-critical systems. It predicts execution time and adaptively adjusts KV cache eviction to balance inference efficiency and response performance within given time budgets, improving task completion rates.

🔹 Publication Date: Published on Dec 26

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.21859
• PDF: https://arxiv.org/pdf/2512.21859

==================================

For more data science resources:
✓ https://t.iss.one/DataScienceT

#LLM #AI #RealTimeAI #InferenceOptimization #DeepLearning

❤1

368 views04:02

✨ Explore Data Science 📝 Write your paper

ML Research Hub

0:55

This media is not supported in your browser

VIEW IN TELEGRAM

✨LiveTalk: Real-Time Multimodal Interactive Video Diffusion via Improved On-Policy Distillation

📝 Summary:
LiveTalk enables real-time multimodal interactive video generation from text, image, and audio by improving on-policy diffusion distillation. It reduces inference latency by 20x while maintaining quality, allowing seamless human-AI interaction.

🔹 Publication Date: Published on Dec 29

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.23576
• PDF: https://arxiv.org/pdf/2512.23576
• Github: https://github.com/GAIR-NLP/LiveTalk

==================================

For more data science resources:
✓ https://t.iss.one/DataScienceT

#VideoGeneration #AI #DiffusionModels #RealTimeAI #MultimodalAI

337 views09:53

✨ Explore Data Science 📝 Write your paper

ML Research Hub

✨YOLO-Master: MOE-Accelerated with Specialized Transformers for Enhanced Real-time Detection

📝 Summary:
YOLO-Master proposes an Efficient Sparse Mixture-of-Experts ES-MoE block for real-time object detection. It adaptively allocates computational resources based on scene complexity using a dynamic routing network, overcoming static computation limits. This improves accuracy and speed, especially on...

🔹 Publication Date: Published on Dec 29

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.23273
• PDF: https://arxiv.org/pdf/2512.23273

==================================

For more data science resources:
✓ https://t.iss.one/DataScienceT

#ObjectDetection #YOLO #MixtureOfExperts #Transformers #RealTimeAI

❤1

330 views11:54

✨ Explore Data Science 📝 Write your paper

ML Research Hub

0:15

This media is not supported in your browser

VIEW IN TELEGRAM

✨Avatar Forcing: Real-Time Interactive Head Avatar Generation for Natural Conversation

📝 Summary:
Avatar Forcing creates real-time interactive talking head avatars. It uses diffusion forcing for low-latency reactions to user input and a label-free preference optimization for expressive, preferred motion, achieving 6.8x speedup.

🔹 Publication Date: Published on Jan 2

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2601.00664
• PDF: https://arxiv.org/pdf/2601.00664
• Project Page: https://taekyungki.github.io/AvatarForcing/
• Github: https://github.com/TaekyungKi/AvatarForcing

==================================

For more data science resources:
✓ https://t.iss.one/DataScienceT

#AvatarGeneration #RealTimeAI #GenerativeAI #ComputerVision #AIResearch

332 views03:01

✨ Explore Data Science 📝 Write your paper

ML Research Hub

✨OmniFlatten: An End-to-end GPT Model for Seamless Voice Conversation

📝 Summary:
OmniFlatten is a GPT-based model for real-time, natural full-duplex spoken dialogue. It uses a multi-stage post-training method to adapt a text LLM for speech and text generation without altering its architecture, enabling low-latency conversations.

🔹 Publication Date: Published on Oct 23, 2024

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2410.17799
• PDF: https://arxiv.org/pdf/2410.17799
• Github: https://github.com/karpathy/nanogpt

==================================

For more data science resources:
✓ https://t.iss.one/DataScienceT

#GPT #VoiceAI #LLM #RealTimeAI #NLP

210 views09:05

✨ Explore Data Science 📝 Write your paper

ML Research Hub

✨Monolith: Real Time Recommendation System With Collisionless Embedding Table

📝 Summary:
Monolith is a real-time recommendation system designed for online training. It features a collisionless embedding table with memory optimizations and a fault-tolerant architecture, enabling real-time learning by overcoming limitations of general DL frameworks.

🔹 Publication Date: Published on Sep 16, 2022

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2209.07663
• PDF: https://arxiv.org/pdf/2209.07663
• Github: https://github.com/bytedance/monolith

==================================

For more data science resources:
✓ https://t.iss.one/DataScienceT

#RecommendationSystems #DeepLearning #MachineLearning #RealTimeAI #DataScience

228 views06:43

✨ Explore Data Science 📝 Write your paper

ML Research Hub

✨FlashLabs Chroma 1.0: A Real-Time End-to-End Spoken Dialogue Model with Personalized Voice Cloning

📝 Summary:
Chroma 1.0 is the first open-source real-time end-to-end spoken dialogue model with personalized voice cloning. It achieves low-latency interaction and high-fidelity voice synthesis, improving speaker similarity by 10.96% over a human baseline.

🔹 Publication Date: Published on Jan 16

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2601.11141
• PDF: https://arxiv.org/pdf/2601.11141
• Project Page: https://www.flashlabs.ai/flashai-voice-agents
• Github: https://github.com/FlashLabs-AI-Corp/FlashLabs-Chroma

🔹 Models citing this paper:
• https://huggingface.co/FlashLabs/Chroma-4B

==================================

For more data science resources:
✓ https://t.iss.one/DataScienceT

#ConversationalAI #VoiceCloning #RealTimeAI #OpenSourceAI #TTS

230 views16:16

✨ Explore Data Science 📝 Write your paper

ML Research Hub

✨Live Avatar: Streaming Real-time Audio-Driven Avatar Generation with Infinite Length

📝 Summary:
Live Avatar enables real-time, high-fidelity, infinite-length avatar generation using a 14B-parameter diffusion model. It employs Timestep-forcing Pipeline Parallelism and the Rolling Sink Frame Mechanism to overcome limitations, achieving 20 FPS on 5 GPUs. This is the first practical system at t...

🔹 Publication Date: Published on Dec 4, 2025

🔹 Paper Links:
• arXiv Page: https://arxivexplained.com/papers/live-avatar-streaming-real-time-audio-driven-avatar-generation-with-infinite-length
• PDF: https://arxiv.org/pdf/2512.04677
• Project Page: https://liveavatar.github.io/
• Github: https://github.com/Alibaba-Quark/LiveAvatar

🔹 Models citing this paper:
• https://huggingface.co/Quark-Vision/Live-Avatar

✨ Spaces citing this paper:
• https://huggingface.co/spaces/ahm98alex/liveavatar-test
• https://huggingface.co/spaces/sdavignon/liveavatar

==================================

For more data science resources:
✓ https://t.iss.one/DataScienceT

#LiveAvatar #AvatarGeneration #RealtimeAI #DiffusionModels #GenerativeAI

Arxivexplained

Live Avatar: Streaming Real-time Audio-Driven Avatar Generation with Infinite Length - Explained Simply

By Yubo Huang, Hailong Guo, Fangtai Wu et al.. # Live Avatar: Real-Time AI Avatars That Never Stop

**The Problem We've All Been Waiting to Solve**...

❤1👍1

636 views10:01

✨ Explore Data Science 📝 Write your paper

ML Research Hub

0:07

This media is not supported in your browser

VIEW IN TELEGRAM

✨PersonaLive! Expressive Portrait Image Animation for Live Streaming

📝 Summary:
PersonaLive enables real-time, expressive portrait animation for live streaming. It uses hybrid implicit signals, appearance distillation, and autoregressive streaming generation to achieve low-latency, stable results with up to 22x speedup.

🔹 Publication Date: Published on Dec 12, 2025

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.11253
• PDF: https://arxiv.org/pdf/2512.11253
• Github: https://github.com/GVCLab/PersonaLive

🔹 Models citing this paper:
• https://huggingface.co/huaichang/PersonaLive

✨ Spaces citing this paper:
• https://huggingface.co/spaces/seawolf2357/personalive

==================================

For more data science resources:
✓ https://t.iss.one/DataScienceT

#PortraitAnimation #LiveStreaming #RealtimeAI #ComputerVision #GenerativeAI

304 views14:03

✨ Explore Data Science 📝 Write your paper

ML Research Hub

✨Fast-FoundationStereo: Real-Time Zero-Shot Stereo Matching

📝 Summary:
Fast-FoundationStereo achieves real-time zero-shot stereo matching, bridging the gap between slow robust models and fast specialized ones. It employs distillation, architecture search, and pruning, running over 10x faster with similar accuracy to prior foundation models. This sets a new state-of-...

🔹 Publication Date: Published on Dec 11, 2025

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.11130
• PDF: https://arxiv.org/pdf/2512.11130
• Project Page: https://nvlabs.github.io/Fast-FoundationStereo/
• Github: https://github.com/NVlabs/Fast-FoundationStereo

==================================

For more data science resources:
✓ https://t.iss.one/DataScienceT

#StereoMatching #ComputerVision #RealTimeAI #ZeroShotLearning #DeepLearning

❤1

486 views19:01

✨ Explore Data Science 📝 Write your paper

ML Research Hub

✨OmniForcing: Unleashing Real-time Joint Audio-Visual Generation

📝 Summary:
OmniForcing transforms slow bidirectional audio-visual diffusion models into fast, real-time streaming generators. It tackles training instability and synchronization by using asymmetric alignment, a global prefix, and an audio sink token. This enables high-fidelity, synchronized generation at 25...

🔹 Publication Date: Published on Mar 12

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.11647
• PDF: https://arxiv.org/pdf/2603.11647
• Project Page: https://omniforcing.com/
• Github: https://github.com/OmniForcing/OmniForcing

==================================

For more data science resources:
✓ https://t.iss.one/DataScienceT

#GenerativeAI #AudioVisual #RealtimeAI #DiffusionModels #DeepLearning

114 views04:03

✨ Explore Data Science 📝 Write your paper

ML Research Hub

✨Video Streaming Thinking: VideoLLMs Can Watch and Think Simultaneously

📝 Summary:
Video Streaming Thinking VST is a novel paradigm for real-time video understanding, enabling AI to think while watching during streaming playback. It optimizes VideoLLMs for responsive, low-latency interaction, showing significantly faster responses and strong performance on various benchmarks.

🔹 Publication Date: Published on Mar 12

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.12262
• PDF: https://arxiv.org/pdf/2603.12262
• Project Page: https://1ranguan.github.io/VST/
• Github: https://github.com/1ranGuan/VST

==================================

For more data science resources:
✓ https://t.iss.one/DataScienceT

#VideoLLMs #RealTimeAI #VideoUnderstanding #AIResearch #MachineLearning

167 views11:05

✨ Explore Data Science 📝 Write your paper

About

Blog

Apps

Platform