Data Science | Machine Learning with Python for Researchers

✨Kimi Linear: An Expressive, Efficient Attention Architecture

📝 Summary:
Kimi Linear is a new hybrid linear attention architecture that outperforms full attention in performance and efficiency across diverse scenarios. It leverages Kimi Delta Attention and Multi-Head Latent Attention, reducing KV cache by up to 75% and boosting decoding throughput by 6x.

🔹 Publication Date: Published on Oct 30

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2510.26692
• PDF: https://arxiv.org/pdf/2510.26692
• Github: https://github.com/MoonshotAI/Kimi-Linear

🔹 Models citing this paper:
• https://huggingface.co/moonshotai/Kimi-Linear-48B-A3B-Instruct
• https://huggingface.co/moonshotai/Kimi-Linear-48B-A3B-Base
• https://huggingface.co/aiqtech/Kimi-Linear-48B-A3B-Instruct

✨ Spaces citing this paper:
• https://huggingface.co/spaces/Speedofmastery/orynxml-agents

==================================

For more data science resources:
✓ https://t.iss.one/DataScienceT

#AttentionMechanisms #LLM #AIResearch #DeepLearning #ModelEfficiency

arXiv.org

Kimi Linear: An Expressive, Efficient Attention Architecture

We introduce Kimi Linear, a hybrid linear attention architecture that, for the first time, outperforms full attention under fair comparisons across various scenarios -- including short-context,...

93 views05:54

✨ Explore Data Science 📝 Write your paper

Data Science | Machine Learning with Python for Researchers

✨Virtual Width Networks

📝 Summary:
Virtual Width Networks VWN enhance model efficiency by expanding representational width without increasing computational cost. VWN accelerates optimization and improves loss reduction, showing a log-linear scaling relation between virtual width and loss.

🔹 Publication Date: Published on Nov 14

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.11238
• PDF: https://arxiv.org/pdf/2511.11238

==================================

For more data science resources:
✓ https://t.iss.one/DataScienceT

#NeuralNetworks #DeepLearning #ModelEfficiency #MachineLearning #AI

196 views04:00

✨ Explore Data Science 📝 Write your paper

Data Science | Machine Learning with Python for Researchers

✨OmniZip: Audio-Guided Dynamic Token Compression for Fast Omnimodal Large Language Models

📝 Summary:
OmniZip is a training-free framework that addresses the computational bottleneck in omnimodal LLMs by dynamically compressing audio-visual tokens. It uses audio retention scores to guide video token pruning, achieving 3.42X inference speedup and 1.4X memory reduction without performance loss.

🔹 Publication Date: Published on Nov 18

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.14582
• PDF: https://arxiv.org/pdf/2511.14582
• Github: https://github.com/KD-TAO/OmniZip

==================================

For more data science resources:
✓ https://t.iss.one/DataScienceT

#OmnimodalLLM #TokenCompression #LLMs #AI #ModelEfficiency

140 views06:03

✨ Explore Data Science 📝 Write your paper

About

Blog

Apps

Platform