✨Inferix: A Block-Diffusion based Next-Generation Inference Engine for World Simulation
📝 Summary:
Inferix is a next-gen inference engine for immersive world simulation, generating high-quality interactive videos. It uses semi-autoregressive block-diffusion with LLM-style KV Cache for efficient, stable generation, enabling real-time world dynamics.
🔹 Publication Date: Published on Nov 25
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.20714
• PDF: https://arxiv.org/pdf/2511.20714
• Github: https://github.com/alibaba-damo-academy/Inferix
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#WorldSimulation #DiffusionModels #GenerativeAI #AIResearch #RealtimeAI
📝 Summary:
Inferix is a next-gen inference engine for immersive world simulation, generating high-quality interactive videos. It uses semi-autoregressive block-diffusion with LLM-style KV Cache for efficient, stable generation, enabling real-time world dynamics.
🔹 Publication Date: Published on Nov 25
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.20714
• PDF: https://arxiv.org/pdf/2511.20714
• Github: https://github.com/alibaba-damo-academy/Inferix
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#WorldSimulation #DiffusionModels #GenerativeAI #AIResearch #RealtimeAI
Media is too big
VIEW IN TELEGRAM
✨VLASH: Real-Time VLAs via Future-State-Aware Asynchronous Inference
📝 Summary:
VLASH is an asynchronous inference framework for VLAs. It achieves fast accurate and low-latency robotic control by estimating future robot states bridging prediction-execution gaps. This enables VLAs to perform high-precision tasks like ping-pong with significant speedup and reduced latency.
🔹 Publication Date: Published on Nov 30
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.01031
• PDF: https://arxiv.org/pdf/2512.01031
• Github: https://github.com/mit-han-lab/vlash
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#Robotics #VisionLanguageModels #RealTimeAI #AIResearch #MachineLearning
📝 Summary:
VLASH is an asynchronous inference framework for VLAs. It achieves fast accurate and low-latency robotic control by estimating future robot states bridging prediction-execution gaps. This enables VLAs to perform high-precision tasks like ping-pong with significant speedup and reduced latency.
🔹 Publication Date: Published on Nov 30
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.01031
• PDF: https://arxiv.org/pdf/2512.01031
• Github: https://github.com/mit-han-lab/vlash
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#Robotics #VisionLanguageModels #RealTimeAI #AIResearch #MachineLearning
This media is not supported in your browser
VIEW IN TELEGRAM
✨RELIC: Interactive Video World Model with Long-Horizon Memory
📝 Summary:
RELIC is a unified framework enabling real-time, memory-aware exploration of scenes with user control. It integrates long-horizon memory and spatial consistency using video-diffusion distillation, achieving 16 FPS generation with robust 3D coherence.
🔹 Publication Date: Published on Dec 3
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.04040
• PDF: https://arxiv.org/pdf/2512.04040
• Project Page: https://relic-worldmodel.github.io/
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#WorldModels #VideoDiffusion #DeepLearning #RealTimeAI #ComputerVision
📝 Summary:
RELIC is a unified framework enabling real-time, memory-aware exploration of scenes with user control. It integrates long-horizon memory and spatial consistency using video-diffusion distillation, achieving 16 FPS generation with robust 3D coherence.
🔹 Publication Date: Published on Dec 3
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.04040
• PDF: https://arxiv.org/pdf/2512.04040
• Project Page: https://relic-worldmodel.github.io/
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#WorldModels #VideoDiffusion #DeepLearning #RealTimeAI #ComputerVision
✨Live Avatar: Streaming Real-time Audio-Driven Avatar Generation with Infinite Length
📝 Summary:
Live Avatar uses a 14-billion-parameter diffusion model to achieve real-time, high-fidelity, infinite-length audio-driven avatar generation. It employs Timestep-forcing Pipeline Parallelism and Rolling Sink Frame Mechanism for efficiency and consistency, reaching 20 FPS on 5 H800 GPUs.
🔹 Publication Date: Published on Dec 4
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.04677
• PDF: https://arxiv.org/pdf/2512.04677
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#LiveAvatar #GenerativeAI #RealtimeAI #DiffusionModels #AvatarGeneration
📝 Summary:
Live Avatar uses a 14-billion-parameter diffusion model to achieve real-time, high-fidelity, infinite-length audio-driven avatar generation. It employs Timestep-forcing Pipeline Parallelism and Rolling Sink Frame Mechanism for efficiency and consistency, reaching 20 FPS on 5 H800 GPUs.
🔹 Publication Date: Published on Dec 4
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.04677
• PDF: https://arxiv.org/pdf/2512.04677
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#LiveAvatar #GenerativeAI #RealtimeAI #DiffusionModels #AvatarGeneration
✨Real-Time Object Detection Meets DINOv3
📝 Summary:
DEIMv2 extends DEIM with DINOv3 features, achieving superior real-time object detection across GPU, edge, and mobile. It uses a Spatial Tuning Adapter and pruned HGNetv2 for diverse models, setting new state of the art with impressive performance-cost trade-offs.
🔹 Publication Date: Published on Sep 25
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2509.20787
• PDF: https://arxiv.org/pdf/2509.20787
• Project Page: https://intellindust-ai-lab.github.io/projects/DEIMv2/
• Github: https://github.com/Intellindust-AI-Lab/DEIMv2
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#ObjectDetection #RealTimeAI #ComputerVision #MachineLearning #EdgeAI
📝 Summary:
DEIMv2 extends DEIM with DINOv3 features, achieving superior real-time object detection across GPU, edge, and mobile. It uses a Spatial Tuning Adapter and pruned HGNetv2 for diverse models, setting new state of the art with impressive performance-cost trade-offs.
🔹 Publication Date: Published on Sep 25
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2509.20787
• PDF: https://arxiv.org/pdf/2509.20787
• Project Page: https://intellindust-ai-lab.github.io/projects/DEIMv2/
• Github: https://github.com/Intellindust-AI-Lab/DEIMv2
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#ObjectDetection #RealTimeAI #ComputerVision #MachineLearning #EdgeAI
This media is not supported in your browser
VIEW IN TELEGRAM
✨PersonaLive! Expressive Portrait Image Animation for Live Streaming
📝 Summary:
PersonaLive is a diffusion framework for real-time portrait animation, overcoming latency issues in live streaming. It uses multi-stage training, implicit signals for motion control, and appearance distillation for efficiency. This achieves state-of-the-art performance with up to 7-22x speedup ov...
🔹 Publication Date: Published on Dec 12
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.11253
• PDF: https://arxiv.org/pdf/2512.11253
• Github: https://github.com/GVCLab/PersonaLive
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#PortraitAnimation #LiveStreaming #DiffusionModels #RealtimeAI #ComputerVision
📝 Summary:
PersonaLive is a diffusion framework for real-time portrait animation, overcoming latency issues in live streaming. It uses multi-stage training, implicit signals for motion control, and appearance distillation for efficiency. This achieves state-of-the-art performance with up to 7-22x speedup ov...
🔹 Publication Date: Published on Dec 12
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.11253
• PDF: https://arxiv.org/pdf/2512.11253
• Github: https://github.com/GVCLab/PersonaLive
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#PortraitAnimation #LiveStreaming #DiffusionModels #RealtimeAI #ComputerVision
❤1
✨Sharp Monocular View Synthesis in Less Than a Second
📝 Summary:
SHARP synthesizes photorealistic 3D views from a single image using a 3D Gaussian representation. It achieves state-of-the-art quality with rapid processing, taking less than a second, and supports metric camera movements.
🔹 Publication Date: Published on Dec 11
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.10685
• PDF: https://arxiv.org/pdf/2512.10685
• Project Page: https://apple.github.io/ml-sharp/
• Github: https://github.com/apple/ml-sharp
🔹 Models citing this paper:
• https://huggingface.co/apple/Sharp
✨ Spaces citing this paper:
• https://huggingface.co/spaces/ronedgecomb/ml-sharp
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#ViewSynthesis #3DVision #ComputerVision #RealtimeAI #GaussianSplats
📝 Summary:
SHARP synthesizes photorealistic 3D views from a single image using a 3D Gaussian representation. It achieves state-of-the-art quality with rapid processing, taking less than a second, and supports metric camera movements.
🔹 Publication Date: Published on Dec 11
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.10685
• PDF: https://arxiv.org/pdf/2512.10685
• Project Page: https://apple.github.io/ml-sharp/
• Github: https://github.com/apple/ml-sharp
🔹 Models citing this paper:
• https://huggingface.co/apple/Sharp
✨ Spaces citing this paper:
• https://huggingface.co/spaces/ronedgecomb/ml-sharp
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#ViewSynthesis #3DVision #ComputerVision #RealtimeAI #GaussianSplats
❤1
✨TimeBill: Time-Budgeted Inference for Large Language Models
📝 Summary:
TimeBill is a framework for LLMs in time-critical systems. It predicts execution time and adaptively adjusts KV cache eviction to balance inference efficiency and response performance within given time budgets, improving task completion rates.
🔹 Publication Date: Published on Dec 26
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.21859
• PDF: https://arxiv.org/pdf/2512.21859
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#LLM #AI #RealTimeAI #InferenceOptimization #DeepLearning
📝 Summary:
TimeBill is a framework for LLMs in time-critical systems. It predicts execution time and adaptively adjusts KV cache eviction to balance inference efficiency and response performance within given time budgets, improving task completion rates.
🔹 Publication Date: Published on Dec 26
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.21859
• PDF: https://arxiv.org/pdf/2512.21859
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#LLM #AI #RealTimeAI #InferenceOptimization #DeepLearning
❤1
This media is not supported in your browser
VIEW IN TELEGRAM
✨LiveTalk: Real-Time Multimodal Interactive Video Diffusion via Improved On-Policy Distillation
📝 Summary:
LiveTalk enables real-time multimodal interactive video generation from text, image, and audio by improving on-policy diffusion distillation. It reduces inference latency by 20x while maintaining quality, allowing seamless human-AI interaction.
🔹 Publication Date: Published on Dec 29
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.23576
• PDF: https://arxiv.org/pdf/2512.23576
• Github: https://github.com/GAIR-NLP/LiveTalk
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#VideoGeneration #AI #DiffusionModels #RealTimeAI #MultimodalAI
📝 Summary:
LiveTalk enables real-time multimodal interactive video generation from text, image, and audio by improving on-policy diffusion distillation. It reduces inference latency by 20x while maintaining quality, allowing seamless human-AI interaction.
🔹 Publication Date: Published on Dec 29
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.23576
• PDF: https://arxiv.org/pdf/2512.23576
• Github: https://github.com/GAIR-NLP/LiveTalk
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#VideoGeneration #AI #DiffusionModels #RealTimeAI #MultimodalAI
✨YOLO-Master: MOE-Accelerated with Specialized Transformers for Enhanced Real-time Detection
📝 Summary:
YOLO-Master proposes an Efficient Sparse Mixture-of-Experts ES-MoE block for real-time object detection. It adaptively allocates computational resources based on scene complexity using a dynamic routing network, overcoming static computation limits. This improves accuracy and speed, especially on...
🔹 Publication Date: Published on Dec 29
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.23273
• PDF: https://arxiv.org/pdf/2512.23273
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#ObjectDetection #YOLO #MixtureOfExperts #Transformers #RealTimeAI
📝 Summary:
YOLO-Master proposes an Efficient Sparse Mixture-of-Experts ES-MoE block for real-time object detection. It adaptively allocates computational resources based on scene complexity using a dynamic routing network, overcoming static computation limits. This improves accuracy and speed, especially on...
🔹 Publication Date: Published on Dec 29
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.23273
• PDF: https://arxiv.org/pdf/2512.23273
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#ObjectDetection #YOLO #MixtureOfExperts #Transformers #RealTimeAI
❤1
This media is not supported in your browser
VIEW IN TELEGRAM
✨Avatar Forcing: Real-Time Interactive Head Avatar Generation for Natural Conversation
📝 Summary:
Avatar Forcing creates real-time interactive talking head avatars. It uses diffusion forcing for low-latency reactions to user input and a label-free preference optimization for expressive, preferred motion, achieving 6.8x speedup.
🔹 Publication Date: Published on Jan 2
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2601.00664
• PDF: https://arxiv.org/pdf/2601.00664
• Project Page: https://taekyungki.github.io/AvatarForcing/
• Github: https://github.com/TaekyungKi/AvatarForcing
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#AvatarGeneration #RealTimeAI #GenerativeAI #ComputerVision #AIResearch
📝 Summary:
Avatar Forcing creates real-time interactive talking head avatars. It uses diffusion forcing for low-latency reactions to user input and a label-free preference optimization for expressive, preferred motion, achieving 6.8x speedup.
🔹 Publication Date: Published on Jan 2
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2601.00664
• PDF: https://arxiv.org/pdf/2601.00664
• Project Page: https://taekyungki.github.io/AvatarForcing/
• Github: https://github.com/TaekyungKi/AvatarForcing
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#AvatarGeneration #RealTimeAI #GenerativeAI #ComputerVision #AIResearch
✨OmniFlatten: An End-to-end GPT Model for Seamless Voice Conversation
📝 Summary:
OmniFlatten is a GPT-based model for real-time, natural full-duplex spoken dialogue. It uses a multi-stage post-training method to adapt a text LLM for speech and text generation without altering its architecture, enabling low-latency conversations.
🔹 Publication Date: Published on Oct 23, 2024
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2410.17799
• PDF: https://arxiv.org/pdf/2410.17799
• Github: https://github.com/karpathy/nanogpt
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#GPT #VoiceAI #LLM #RealTimeAI #NLP
📝 Summary:
OmniFlatten is a GPT-based model for real-time, natural full-duplex spoken dialogue. It uses a multi-stage post-training method to adapt a text LLM for speech and text generation without altering its architecture, enabling low-latency conversations.
🔹 Publication Date: Published on Oct 23, 2024
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2410.17799
• PDF: https://arxiv.org/pdf/2410.17799
• Github: https://github.com/karpathy/nanogpt
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#GPT #VoiceAI #LLM #RealTimeAI #NLP