✨Accelerating Streaming Video Large Language Models via Hierarchical Token Compression
📝 Summary:
Streaming VideoLLMs face high latency from ViT encoding and LLM pre-filling. STC, a hierarchical framework, optimizes this by caching features and pruning tokens. It reduces latency by up to 24.5 percent for ViT and 45.3 percent for LLM pre-filling, retaining 99 percent accuracy.
🔹 Publication Date: Published on Nov 30
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.00891
• PDF: https://arxiv.org/pdf/2512.00891
• Github: https://github.com/lern-to-write/STC
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#VideoLLM #LLM #DeepLearning #AI #PerformanceOptimization
📝 Summary:
Streaming VideoLLMs face high latency from ViT encoding and LLM pre-filling. STC, a hierarchical framework, optimizes this by caching features and pruning tokens. It reduces latency by up to 24.5 percent for ViT and 45.3 percent for LLM pre-filling, retaining 99 percent accuracy.
🔹 Publication Date: Published on Nov 30
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.00891
• PDF: https://arxiv.org/pdf/2512.00891
• Github: https://github.com/lern-to-write/STC
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#VideoLLM #LLM #DeepLearning #AI #PerformanceOptimization