ML Research Hub

✨UniQL: Unified Quantization and Low-rank Compression for Adaptive Edge LLMs

📝 Summary:
UniQL unifies quantization and low-rank compression to deploy LLMs on mobile devices. It reduces memory by 4x-5.7x and improves token throughput by 2.7x-3.4x, maintaining accuracy across various model types.

🔹 Publication Date: Published on Dec 3

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.03383
• PDF: https://arxiv.org/pdf/2512.03383
• Project Page: https://hychiang.info/projects/uniql/
• Github: https://github.com/enyac-group/UniQL

==================================

For more data science resources:
✓ https://t.iss.one/DataScienceT

#LLMs #EdgeAI #Quantization #ModelCompression #DeepLearning

251 views08:03

✨ Explore Data Science 📝 Write your paper

ML Research Hub

✨Fairy2i: Training Complex LLMs from Real LLMs with All Parameters in {pm 1, pm i}

📝 Summary:
Fairy2i converts pre-trained real-valued LLMs to a complex form, enabling efficient low-bit quantization while reusing existing checkpoints. It achieves near full-precision performance for LLaMA-2 7B at 2-bit, significantly outperforming real-valued binary methods.

🔹 Publication Date: Published on Dec 2

🔹 Paper Links:
• arXiv Page: https://arxiv.org/pdf/2512.02901
• PDF: https://arxiv.org/pdf/2512.02901
• Github: https://github.com/PKULab1806/Fairy2i-W2

🔹 Models citing this paper:
• https://huggingface.co/PKU-DS-LAB/Fairy2i-W2

==================================

For more data science resources:
✓ https://t.iss.one/DataScienceT

#LLM #Quantization #ModelCompression #DeepLearning #AIResearch

❤2

395 views14:05

✨ Explore Data Science 📝 Write your paper

ML Research Hub

✨BitNet Distillation

📝 Summary:
BitNet Distillation fine-tunes LLMs to 1.58-bit precision using SubLN, attention distillation, and continual pre-training. It achieves comparable performance to full-precision models, offering 10x memory savings and 2.65x faster inference.

🔹 Publication Date: Published on Oct 15, 2025

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2510.13998
• PDF: https://arxiv.org/pdf/2510.13998
• Github: https://github.com/microsoft/BitNet

==================================

For more data science resources:
✓ https://t.iss.one/DataScienceT

#LLM #Quantization #ModelCompression #DeepLearning #AI

278 views03:01

✨ Explore Data Science 📝 Write your paper

ML Research Hub

✨QuantLRM: Quantization of Large Reasoning Models via Fine-Tuning Signals

📝 Summary:
QuantLRM improves Large Reasoning Model quantization by using weight update magnitudes from fine-tuning to estimate channel importance. It protects both smallest and largest updates, consistently outperforming traditional methods and applying even to non-fine-tuned models.

🔹 Publication Date: Published on Jan 31

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2602.02581
• PDF: https://arxiv.org/pdf/2602.02581
• Github: https://github.com/psunlpgroup/QuantLRM

🔹 Models citing this paper:
• https://huggingface.co/nanzhang/QuantLRM-R1-Qwen-32B-3-bit
• https://huggingface.co/nanzhang/QuantLRM-R1-Llama-70B-3-bit
• https://huggingface.co/nanzhang/QuantLRM-R1-Qwen3-8B-3-bit

==================================

For more data science resources:
✓ https://t.iss.one/DataScienceT

#Quantization #LargeLanguageModels #DeepLearning #AI #ModelCompression

224 views08:48

✨ Explore Data Science 📝 Write your paper

ML Research Hub

✨COMPOT: Calibration-Optimized Matrix Procrustes Orthogonalization for Transformers Compression

📝 Summary:
COMPOT is a training-free Transformer compression framework. It uses sparse dictionary learning with orthogonal dictionaries and closed-form updates, outperforming traditional low-rank methods. This results in a superior quality-compression trade-off by also adaptively allocating layer-wise compr...

🔹 Publication Date: Published on Feb 16

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2602.15200
• PDF: https://arxiv.org/pdf/2602.15200

==================================

For more data science resources:
✓ https://t.iss.one/DataScienceT

#Transformers #ModelCompression #DeepLearning #AIResearch #Optimization

❤2

281 views09:03

✨ Explore Data Science 📝 Write your paper

ML Research Hub

✨DUET-VLM: Dual stage Unified Efficient Token reduction for VLM Training and Inference

📝 Summary:
DUET-VLM proposes a dual-stage compression framework for Vision-Language Models. It first reduces visual tokens from the vision encoder, then progressively drops less informative tokens in the language backbone, guided by text. This maintains high accuracy while significantly reducing computation...

🔹 Publication Date: Published on Feb 21

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2602.18846
• PDF: https://arxiv.org/pdf/2602.18846
• Github: https://github.com/AMD-AGI/DUET-VLM

==================================

For more data science resources:
✓ https://t.iss.one/DataScienceT

#VLM #ModelCompression #AI #DeepLearning #Efficiency

279 views17:06

✨ Explore Data Science 📝 Write your paper

About

Blog

Apps

Platform