ML Research Hub
32.5K subscribers
5.96K photos
383 videos
24 files
6.44K links
Advancing research in Machine Learning – practical insights, tools, and techniques for researchers.

Admin: @HusseinSheikho || @Hussein_Sheikho
Download Telegram
FlashVSR: Towards Real-Time Diffusion-Based Streaming Video Super-Resolution

📝 Summary:
FlashVSR introduces the first real-time, one-step streaming diffusion framework for video super-resolution. It addresses high latency and computation through innovations like distillation, sparse attention, and a tiny decoder. FlashVSR achieves state-of-the-art performance with up to 12x speedup.

🔹 Publication Date: Published on Oct 14

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2510.12747
• PDF: https://arxiv.org/pdf/2510.12747
• Project Page: https://zhuang2002.github.io/FlashVSR/
• Github: https://github.com/OpenImagingLab/FlashVSR

🔹 Models citing this paper:
https://huggingface.co/JunhaoZhuang/FlashVSR
https://huggingface.co/JunhaoZhuang/FlashVSR-v1.1

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#FlashVSR #VideoSuperResolution #RealTimeAI #DiffusionModels #ComputerVision
🔥1
Inferix: A Block-Diffusion based Next-Generation Inference Engine for World Simulation

📝 Summary:
Inferix is a next-gen inference engine for immersive world simulation, generating high-quality interactive videos. It uses semi-autoregressive block-diffusion with LLM-style KV Cache for efficient, stable generation, enabling real-time world dynamics.

🔹 Publication Date: Published on Nov 25

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.20714
• PDF: https://arxiv.org/pdf/2511.20714
• Github: https://github.com/alibaba-damo-academy/Inferix

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#WorldSimulation #DiffusionModels #GenerativeAI #AIResearch #RealtimeAI
Media is too big
VIEW IN TELEGRAM
VLASH: Real-Time VLAs via Future-State-Aware Asynchronous Inference

📝 Summary:
VLASH is an asynchronous inference framework for VLAs. It achieves fast accurate and low-latency robotic control by estimating future robot states bridging prediction-execution gaps. This enables VLAs to perform high-precision tasks like ping-pong with significant speedup and reduced latency.

🔹 Publication Date: Published on Nov 30

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.01031
• PDF: https://arxiv.org/pdf/2512.01031
• Github: https://github.com/mit-han-lab/vlash

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#Robotics #VisionLanguageModels #RealTimeAI #AIResearch #MachineLearning
This media is not supported in your browser
VIEW IN TELEGRAM
RELIC: Interactive Video World Model with Long-Horizon Memory

📝 Summary:
RELIC is a unified framework enabling real-time, memory-aware exploration of scenes with user control. It integrates long-horizon memory and spatial consistency using video-diffusion distillation, achieving 16 FPS generation with robust 3D coherence.

🔹 Publication Date: Published on Dec 3

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.04040
• PDF: https://arxiv.org/pdf/2512.04040
• Project Page: https://relic-worldmodel.github.io/

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#WorldModels #VideoDiffusion #DeepLearning #RealTimeAI #ComputerVision
Live Avatar: Streaming Real-time Audio-Driven Avatar Generation with Infinite Length

📝 Summary:
Live Avatar uses a 14-billion-parameter diffusion model to achieve real-time, high-fidelity, infinite-length audio-driven avatar generation. It employs Timestep-forcing Pipeline Parallelism and Rolling Sink Frame Mechanism for efficiency and consistency, reaching 20 FPS on 5 H800 GPUs.

🔹 Publication Date: Published on Dec 4

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.04677
• PDF: https://arxiv.org/pdf/2512.04677

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#LiveAvatar #GenerativeAI #RealtimeAI #DiffusionModels #AvatarGeneration
Real-Time Object Detection Meets DINOv3

📝 Summary:
DEIMv2 extends DEIM with DINOv3 features, achieving superior real-time object detection across GPU, edge, and mobile. It uses a Spatial Tuning Adapter and pruned HGNetv2 for diverse models, setting new state of the art with impressive performance-cost trade-offs.

🔹 Publication Date: Published on Sep 25

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2509.20787
• PDF: https://arxiv.org/pdf/2509.20787
• Project Page: https://intellindust-ai-lab.github.io/projects/DEIMv2/
• Github: https://github.com/Intellindust-AI-Lab/DEIMv2

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#ObjectDetection #RealTimeAI #ComputerVision #MachineLearning #EdgeAI
1
This media is not supported in your browser
VIEW IN TELEGRAM
PersonaLive! Expressive Portrait Image Animation for Live Streaming

📝 Summary:
PersonaLive is a diffusion framework for real-time portrait animation, overcoming latency issues in live streaming. It uses multi-stage training, implicit signals for motion control, and appearance distillation for efficiency. This achieves state-of-the-art performance with up to 7-22x speedup ov...

🔹 Publication Date: Published on Dec 12

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.11253
• PDF: https://arxiv.org/pdf/2512.11253
• Github: https://github.com/GVCLab/PersonaLive

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#PortraitAnimation #LiveStreaming #DiffusionModels #RealtimeAI #ComputerVision
1
Sharp Monocular View Synthesis in Less Than a Second

📝 Summary:
SHARP synthesizes photorealistic 3D views from a single image using a 3D Gaussian representation. It achieves state-of-the-art quality with rapid processing, taking less than a second, and supports metric camera movements.

🔹 Publication Date: Published on Dec 11

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.10685
• PDF: https://arxiv.org/pdf/2512.10685
• Project Page: https://apple.github.io/ml-sharp/
• Github: https://github.com/apple/ml-sharp

🔹 Models citing this paper:
https://huggingface.co/apple/Sharp

Spaces citing this paper:
https://huggingface.co/spaces/ronedgecomb/ml-sharp

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#ViewSynthesis #3DVision #ComputerVision #RealtimeAI #GaussianSplats
1
TimeBill: Time-Budgeted Inference for Large Language Models

📝 Summary:
TimeBill is a framework for LLMs in time-critical systems. It predicts execution time and adaptively adjusts KV cache eviction to balance inference efficiency and response performance within given time budgets, improving task completion rates.

🔹 Publication Date: Published on Dec 26

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.21859
• PDF: https://arxiv.org/pdf/2512.21859

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#LLM #AI #RealTimeAI #InferenceOptimization #DeepLearning
1
This media is not supported in your browser
VIEW IN TELEGRAM
LiveTalk: Real-Time Multimodal Interactive Video Diffusion via Improved On-Policy Distillation

📝 Summary:
LiveTalk enables real-time multimodal interactive video generation from text, image, and audio by improving on-policy diffusion distillation. It reduces inference latency by 20x while maintaining quality, allowing seamless human-AI interaction.

🔹 Publication Date: Published on Dec 29

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.23576
• PDF: https://arxiv.org/pdf/2512.23576
• Github: https://github.com/GAIR-NLP/LiveTalk

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#VideoGeneration #AI #DiffusionModels #RealTimeAI #MultimodalAI
YOLO-Master: MOE-Accelerated with Specialized Transformers for Enhanced Real-time Detection

📝 Summary:
YOLO-Master proposes an Efficient Sparse Mixture-of-Experts ES-MoE block for real-time object detection. It adaptively allocates computational resources based on scene complexity using a dynamic routing network, overcoming static computation limits. This improves accuracy and speed, especially on...

🔹 Publication Date: Published on Dec 29

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.23273
• PDF: https://arxiv.org/pdf/2512.23273

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#ObjectDetection #YOLO #MixtureOfExperts #Transformers #RealTimeAI
1
This media is not supported in your browser
VIEW IN TELEGRAM
Avatar Forcing: Real-Time Interactive Head Avatar Generation for Natural Conversation

📝 Summary:
Avatar Forcing creates real-time interactive talking head avatars. It uses diffusion forcing for low-latency reactions to user input and a label-free preference optimization for expressive, preferred motion, achieving 6.8x speedup.

🔹 Publication Date: Published on Jan 2

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2601.00664
• PDF: https://arxiv.org/pdf/2601.00664
• Project Page: https://taekyungki.github.io/AvatarForcing/
• Github: https://github.com/TaekyungKi/AvatarForcing

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#AvatarGeneration #RealTimeAI #GenerativeAI #ComputerVision #AIResearch
OmniFlatten: An End-to-end GPT Model for Seamless Voice Conversation

📝 Summary:
OmniFlatten is a GPT-based model for real-time, natural full-duplex spoken dialogue. It uses a multi-stage post-training method to adapt a text LLM for speech and text generation without altering its architecture, enabling low-latency conversations.

🔹 Publication Date: Published on Oct 23, 2024

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2410.17799
• PDF: https://arxiv.org/pdf/2410.17799
• Github: https://github.com/karpathy/nanogpt

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#GPT #VoiceAI #LLM #RealTimeAI #NLP
Monolith: Real Time Recommendation System With Collisionless Embedding Table

📝 Summary:
Monolith is a real-time recommendation system designed for online training. It features a collisionless embedding table with memory optimizations and a fault-tolerant architecture, enabling real-time learning by overcoming limitations of general DL frameworks.

🔹 Publication Date: Published on Sep 16, 2022

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2209.07663
• PDF: https://arxiv.org/pdf/2209.07663
• Github: https://github.com/bytedance/monolith

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#RecommendationSystems #DeepLearning #MachineLearning #RealTimeAI #DataScience
FlashLabs Chroma 1.0: A Real-Time End-to-End Spoken Dialogue Model with Personalized Voice Cloning

📝 Summary:
Chroma 1.0 is the first open-source real-time end-to-end spoken dialogue model with personalized voice cloning. It achieves low-latency interaction and high-fidelity voice synthesis, improving speaker similarity by 10.96% over a human baseline.

🔹 Publication Date: Published on Jan 16

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2601.11141
• PDF: https://arxiv.org/pdf/2601.11141
• Project Page: https://www.flashlabs.ai/flashai-voice-agents
• Github: https://github.com/FlashLabs-AI-Corp/FlashLabs-Chroma

🔹 Models citing this paper:
https://huggingface.co/FlashLabs/Chroma-4B

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#ConversationalAI #VoiceCloning #RealTimeAI #OpenSourceAI #TTS
Live Avatar: Streaming Real-time Audio-Driven Avatar Generation with Infinite Length

📝 Summary:
Live Avatar enables real-time, high-fidelity, infinite-length avatar generation using a 14B-parameter diffusion model. It employs Timestep-forcing Pipeline Parallelism and the Rolling Sink Frame Mechanism to overcome limitations, achieving 20 FPS on 5 GPUs. This is the first practical system at t...

🔹 Publication Date: Published on Dec 4, 2025

🔹 Paper Links:
• arXiv Page: https://arxivexplained.com/papers/live-avatar-streaming-real-time-audio-driven-avatar-generation-with-infinite-length
• PDF: https://arxiv.org/pdf/2512.04677
• Project Page: https://liveavatar.github.io/
• Github: https://github.com/Alibaba-Quark/LiveAvatar

🔹 Models citing this paper:
https://huggingface.co/Quark-Vision/Live-Avatar

Spaces citing this paper:
https://huggingface.co/spaces/ahm98alex/liveavatar-test
https://huggingface.co/spaces/sdavignon/liveavatar

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#LiveAvatar #AvatarGeneration #RealtimeAI #DiffusionModels #GenerativeAI
1👍1
This media is not supported in your browser
VIEW IN TELEGRAM
PersonaLive! Expressive Portrait Image Animation for Live Streaming

📝 Summary:
PersonaLive enables real-time, expressive portrait animation for live streaming. It uses hybrid implicit signals, appearance distillation, and autoregressive streaming generation to achieve low-latency, stable results with up to 22x speedup.

🔹 Publication Date: Published on Dec 12, 2025

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.11253
• PDF: https://arxiv.org/pdf/2512.11253
• Github: https://github.com/GVCLab/PersonaLive

🔹 Models citing this paper:
https://huggingface.co/huaichang/PersonaLive

Spaces citing this paper:
https://huggingface.co/spaces/seawolf2357/personalive

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#PortraitAnimation #LiveStreaming #RealtimeAI #ComputerVision #GenerativeAI
Fast-FoundationStereo: Real-Time Zero-Shot Stereo Matching

📝 Summary:
Fast-FoundationStereo achieves real-time zero-shot stereo matching, bridging the gap between slow robust models and fast specialized ones. It employs distillation, architecture search, and pruning, running over 10x faster with similar accuracy to prior foundation models. This sets a new state-of-...

🔹 Publication Date: Published on Dec 11, 2025

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.11130
• PDF: https://arxiv.org/pdf/2512.11130
• Project Page: https://nvlabs.github.io/Fast-FoundationStereo/
• Github: https://github.com/NVlabs/Fast-FoundationStereo

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#StereoMatching #ComputerVision #RealTimeAI #ZeroShotLearning #DeepLearning
1
OmniForcing: Unleashing Real-time Joint Audio-Visual Generation

📝 Summary:
OmniForcing transforms slow bidirectional audio-visual diffusion models into fast, real-time streaming generators. It tackles training instability and synchronization by using asymmetric alignment, a global prefix, and an audio sink token. This enables high-fidelity, synchronized generation at 25...

🔹 Publication Date: Published on Mar 12

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.11647
• PDF: https://arxiv.org/pdf/2603.11647
• Project Page: https://omniforcing.com/
• Github: https://github.com/OmniForcing/OmniForcing

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#GenerativeAI #AudioVisual #RealtimeAI #DiffusionModels #DeepLearning
Video Streaming Thinking: VideoLLMs Can Watch and Think Simultaneously

📝 Summary:
Video Streaming Thinking VST is a novel paradigm for real-time video understanding, enabling AI to think while watching during streaming playback. It optimizes VideoLLMs for responsive, low-latency interaction, showing significantly faster responses and strong performance on various benchmarks.

🔹 Publication Date: Published on Mar 12

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2603.12262
• PDF: https://arxiv.org/pdf/2603.12262
• Project Page: https://1ranguan.github.io/VST/
• Github: https://github.com/1ranGuan/VST

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#VideoLLMs #RealTimeAI #VideoUnderstanding #AIResearch #MachineLearning