✨Accelerating Streaming Video Large Language Models via Hierarchical Token Compression
📝 Summary:
Streaming VideoLLMs face high latency from ViT encoding and LLM pre-filling. STC, a hierarchical framework, optimizes this by caching features and pruning tokens. It reduces latency by up to 24.5 percent for ViT and 45.3 percent for LLM pre-filling, retaining 99 percent accuracy.
🔹 Publication Date: Published on Nov 30
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.00891
• PDF: https://arxiv.org/pdf/2512.00891
• Github: https://github.com/lern-to-write/STC
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#VideoLLM #LLM #DeepLearning #AI #PerformanceOptimization
📝 Summary:
Streaming VideoLLMs face high latency from ViT encoding and LLM pre-filling. STC, a hierarchical framework, optimizes this by caching features and pruning tokens. It reduces latency by up to 24.5 percent for ViT and 45.3 percent for LLM pre-filling, retaining 99 percent accuracy.
🔹 Publication Date: Published on Nov 30
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.00891
• PDF: https://arxiv.org/pdf/2512.00891
• Github: https://github.com/lern-to-write/STC
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#VideoLLM #LLM #DeepLearning #AI #PerformanceOptimization
✨SCALE: Selective Resource Allocation for Overcoming Performance Bottlenecks in Mathematical Test-time Scaling
📝 Summary:
SCALE improves LLM math reasoning by selectively allocating resources based on sub-problem difficulty. It addresses uniform allocation bottlenecks, boosting accuracy up to 13.75% and cutting costs by 33-53% compared to uniform scaling.
🔹 Publication Date: Published on Nov 29
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.00466
• PDF: https://arxiv.org/pdf/2512.00466
• Github: https://github.com/XiaoYang66/DualThinking
✨ Datasets citing this paper:
• https://huggingface.co/datasets/YangXiao-nlp/DualThinking
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#LLM #AI #MachineLearning #PerformanceOptimization #MathReasoning
📝 Summary:
SCALE improves LLM math reasoning by selectively allocating resources based on sub-problem difficulty. It addresses uniform allocation bottlenecks, boosting accuracy up to 13.75% and cutting costs by 33-53% compared to uniform scaling.
🔹 Publication Date: Published on Nov 29
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.00466
• PDF: https://arxiv.org/pdf/2512.00466
• Github: https://github.com/XiaoYang66/DualThinking
✨ Datasets citing this paper:
• https://huggingface.co/datasets/YangXiao-nlp/DualThinking
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#LLM #AI #MachineLearning #PerformanceOptimization #MathReasoning
✨LMCache: An Efficient KV Cache Layer for Enterprise-Scale LLM Inference
📝 Summary:
LMCACHE is an efficient open-source solution for offloading and transferring LLM KV caches from GPU memory. It enables cache reuse across different queries and inference engines, addressing the problem of growing cache sizes. This improves throughput up to 15 times.
🔹 Publication Date: Published on Oct 8, 2025
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2510.09665
• PDF: https://arxiv.org/pdf/2510.09665
• Github: https://github.com/LMCache/LMCache
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#LLM #KVCache #GPU #AIInference #PerformanceOptimization
📝 Summary:
LMCACHE is an efficient open-source solution for offloading and transferring LLM KV caches from GPU memory. It enables cache reuse across different queries and inference engines, addressing the problem of growing cache sizes. This improves throughput up to 15 times.
🔹 Publication Date: Published on Oct 8, 2025
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2510.09665
• PDF: https://arxiv.org/pdf/2510.09665
• Github: https://github.com/LMCache/LMCache
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#LLM #KVCache #GPU #AIInference #PerformanceOptimization