✨UniQL: Unified Quantization and Low-rank Compression for Adaptive Edge LLMs
📝 Summary:
UniQL unifies quantization and low-rank compression to deploy LLMs on mobile devices. It reduces memory by 4x-5.7x and improves token throughput by 2.7x-3.4x, maintaining accuracy across various model types.
🔹 Publication Date: Published on Dec 3
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.03383
• PDF: https://arxiv.org/pdf/2512.03383
• Project Page: https://hychiang.info/projects/uniql/
• Github: https://github.com/enyac-group/UniQL
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#LLMs #EdgeAI #Quantization #ModelCompression #DeepLearning
📝 Summary:
UniQL unifies quantization and low-rank compression to deploy LLMs on mobile devices. It reduces memory by 4x-5.7x and improves token throughput by 2.7x-3.4x, maintaining accuracy across various model types.
🔹 Publication Date: Published on Dec 3
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.03383
• PDF: https://arxiv.org/pdf/2512.03383
• Project Page: https://hychiang.info/projects/uniql/
• Github: https://github.com/enyac-group/UniQL
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#LLMs #EdgeAI #Quantization #ModelCompression #DeepLearning
✨Fairy2i: Training Complex LLMs from Real LLMs with All Parameters in {pm 1, pm i}
📝 Summary:
Fairy2i converts pre-trained real-valued LLMs to a complex form, enabling efficient low-bit quantization while reusing existing checkpoints. It achieves near full-precision performance for LLaMA-2 7B at 2-bit, significantly outperforming real-valued binary methods.
🔹 Publication Date: Published on Dec 2
🔹 Paper Links:
• arXiv Page: https://arxiv.org/pdf/2512.02901
• PDF: https://arxiv.org/pdf/2512.02901
• Github: https://github.com/PKULab1806/Fairy2i-W2
🔹 Models citing this paper:
• https://huggingface.co/PKU-DS-LAB/Fairy2i-W2
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#LLM #Quantization #ModelCompression #DeepLearning #AIResearch
📝 Summary:
Fairy2i converts pre-trained real-valued LLMs to a complex form, enabling efficient low-bit quantization while reusing existing checkpoints. It achieves near full-precision performance for LLaMA-2 7B at 2-bit, significantly outperforming real-valued binary methods.
🔹 Publication Date: Published on Dec 2
🔹 Paper Links:
• arXiv Page: https://arxiv.org/pdf/2512.02901
• PDF: https://arxiv.org/pdf/2512.02901
• Github: https://github.com/PKULab1806/Fairy2i-W2
🔹 Models citing this paper:
• https://huggingface.co/PKU-DS-LAB/Fairy2i-W2
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#LLM #Quantization #ModelCompression #DeepLearning #AIResearch
❤2
✨BitNet Distillation
📝 Summary:
BitNet Distillation fine-tunes LLMs to 1.58-bit precision using SubLN, attention distillation, and continual pre-training. It achieves comparable performance to full-precision models, offering 10x memory savings and 2.65x faster inference.
🔹 Publication Date: Published on Oct 15, 2025
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2510.13998
• PDF: https://arxiv.org/pdf/2510.13998
• Github: https://github.com/microsoft/BitNet
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#LLM #Quantization #ModelCompression #DeepLearning #AI
📝 Summary:
BitNet Distillation fine-tunes LLMs to 1.58-bit precision using SubLN, attention distillation, and continual pre-training. It achieves comparable performance to full-precision models, offering 10x memory savings and 2.65x faster inference.
🔹 Publication Date: Published on Oct 15, 2025
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2510.13998
• PDF: https://arxiv.org/pdf/2510.13998
• Github: https://github.com/microsoft/BitNet
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#LLM #Quantization #ModelCompression #DeepLearning #AI
✨QuantLRM: Quantization of Large Reasoning Models via Fine-Tuning Signals
📝 Summary:
QuantLRM improves Large Reasoning Model quantization by using weight update magnitudes from fine-tuning to estimate channel importance. It protects both smallest and largest updates, consistently outperforming traditional methods and applying even to non-fine-tuned models.
🔹 Publication Date: Published on Jan 31
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2602.02581
• PDF: https://arxiv.org/pdf/2602.02581
• Github: https://github.com/psunlpgroup/QuantLRM
🔹 Models citing this paper:
• https://huggingface.co/nanzhang/QuantLRM-R1-Qwen-32B-3-bit
• https://huggingface.co/nanzhang/QuantLRM-R1-Llama-70B-3-bit
• https://huggingface.co/nanzhang/QuantLRM-R1-Qwen3-8B-3-bit
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#Quantization #LargeLanguageModels #DeepLearning #AI #ModelCompression
📝 Summary:
QuantLRM improves Large Reasoning Model quantization by using weight update magnitudes from fine-tuning to estimate channel importance. It protects both smallest and largest updates, consistently outperforming traditional methods and applying even to non-fine-tuned models.
🔹 Publication Date: Published on Jan 31
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2602.02581
• PDF: https://arxiv.org/pdf/2602.02581
• Github: https://github.com/psunlpgroup/QuantLRM
🔹 Models citing this paper:
• https://huggingface.co/nanzhang/QuantLRM-R1-Qwen-32B-3-bit
• https://huggingface.co/nanzhang/QuantLRM-R1-Llama-70B-3-bit
• https://huggingface.co/nanzhang/QuantLRM-R1-Qwen3-8B-3-bit
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#Quantization #LargeLanguageModels #DeepLearning #AI #ModelCompression
✨COMPOT: Calibration-Optimized Matrix Procrustes Orthogonalization for Transformers Compression
📝 Summary:
COMPOT is a training-free Transformer compression framework. It uses sparse dictionary learning with orthogonal dictionaries and closed-form updates, outperforming traditional low-rank methods. This results in a superior quality-compression trade-off by also adaptively allocating layer-wise compr...
🔹 Publication Date: Published on Feb 16
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2602.15200
• PDF: https://arxiv.org/pdf/2602.15200
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#Transformers #ModelCompression #DeepLearning #AIResearch #Optimization
📝 Summary:
COMPOT is a training-free Transformer compression framework. It uses sparse dictionary learning with orthogonal dictionaries and closed-form updates, outperforming traditional low-rank methods. This results in a superior quality-compression trade-off by also adaptively allocating layer-wise compr...
🔹 Publication Date: Published on Feb 16
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2602.15200
• PDF: https://arxiv.org/pdf/2602.15200
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#Transformers #ModelCompression #DeepLearning #AIResearch #Optimization
❤2
✨DUET-VLM: Dual stage Unified Efficient Token reduction for VLM Training and Inference
📝 Summary:
DUET-VLM proposes a dual-stage compression framework for Vision-Language Models. It first reduces visual tokens from the vision encoder, then progressively drops less informative tokens in the language backbone, guided by text. This maintains high accuracy while significantly reducing computation...
🔹 Publication Date: Published on Feb 21
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2602.18846
• PDF: https://arxiv.org/pdf/2602.18846
• Github: https://github.com/AMD-AGI/DUET-VLM
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#VLM #ModelCompression #AI #DeepLearning #Efficiency
📝 Summary:
DUET-VLM proposes a dual-stage compression framework for Vision-Language Models. It first reduces visual tokens from the vision encoder, then progressively drops less informative tokens in the language backbone, guided by text. This maintains high accuracy while significantly reducing computation...
🔹 Publication Date: Published on Feb 21
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2602.18846
• PDF: https://arxiv.org/pdf/2602.18846
• Github: https://github.com/AMD-AGI/DUET-VLM
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#VLM #ModelCompression #AI #DeepLearning #Efficiency