ML Research Hub
32.8K subscribers
4.1K photos
238 videos
23 files
4.42K links
Advancing research in Machine Learning – practical insights, tools, and techniques for researchers.

Admin: @HusseinSheikho || @Hussein_Sheikho
Download Telegram
Accelerating Streaming Video Large Language Models via Hierarchical Token Compression

📝 Summary:
Streaming VideoLLMs face high latency from ViT encoding and LLM pre-filling. STC, a hierarchical framework, optimizes this by caching features and pruning tokens. It reduces latency by up to 24.5 percent for ViT and 45.3 percent for LLM pre-filling, retaining 99 percent accuracy.

🔹 Publication Date: Published on Nov 30

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.00891
• PDF: https://arxiv.org/pdf/2512.00891
• Github: https://github.com/lern-to-write/STC

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#VideoLLM #LLM #DeepLearning #AI #PerformanceOptimization
SCALE: Selective Resource Allocation for Overcoming Performance Bottlenecks in Mathematical Test-time Scaling

📝 Summary:
SCALE improves LLM math reasoning by selectively allocating resources based on sub-problem difficulty. It addresses uniform allocation bottlenecks, boosting accuracy up to 13.75% and cutting costs by 33-53% compared to uniform scaling.

🔹 Publication Date: Published on Nov 29

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.00466
• PDF: https://arxiv.org/pdf/2512.00466
• Github: https://github.com/XiaoYang66/DualThinking

Datasets citing this paper:
https://huggingface.co/datasets/YangXiao-nlp/DualThinking

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#LLM #AI #MachineLearning #PerformanceOptimization #MathReasoning
LMCache: An Efficient KV Cache Layer for Enterprise-Scale LLM Inference

📝 Summary:
LMCACHE is an efficient open-source solution for offloading and transferring LLM KV caches from GPU memory. It enables cache reuse across different queries and inference engines, addressing the problem of growing cache sizes. This improves throughput up to 15 times.

🔹 Publication Date: Published on Oct 8, 2025

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2510.09665
• PDF: https://arxiv.org/pdf/2510.09665
• Github: https://github.com/LMCache/LMCache

==================================

For more data science resources:
https://t.iss.one/DataScienceT

#LLM #KVCache #GPU #AIInference #PerformanceOptimization