✨SQ-format: A Unified Sparse-Quantized Hardware-friendly Data Format for LLMs
📝 Summary:
The SQ-format is a unified sparse-quantized data format for LLM post-training quantization. It improves accuracy and efficiency balance by combining sparse and low-precision matrix multiplications. This enables better performance and throughput, especially for outlier activations, supporting next...
🔹 Publication Date: Published on Dec 5
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.05409
• PDF: https://arxiv.org/pdf/2512.05409
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#LLMs #Quantization #SparseML #HardwareAcceleration #AIResearch
📝 Summary:
The SQ-format is a unified sparse-quantized data format for LLM post-training quantization. It improves accuracy and efficiency balance by combining sparse and low-precision matrix multiplications. This enables better performance and throughput, especially for outlier activations, supporting next...
🔹 Publication Date: Published on Dec 5
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.05409
• PDF: https://arxiv.org/pdf/2512.05409
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#LLMs #Quantization #SparseML #HardwareAcceleration #AIResearch
❤1
✨MemLoRA: Distilling Expert Adapters for On-Device Memory Systems
📝 Summary:
MemLoRA and MemLoRA-V enable efficient on-device memory-augmented AI by equipping small language and vision-language models with specialized, distilled memory adapters. This allows accurate local memory operations and native visual understanding, outperforming larger baselines in text and visual ...
🔹 Publication Date: Published on Dec 4
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.04763
• PDF: https://arxiv.org/pdf/2512.04763
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#OnDeviceAI #LLMs #VLMs #AIAdapters #MemoryAugmentedAI
📝 Summary:
MemLoRA and MemLoRA-V enable efficient on-device memory-augmented AI by equipping small language and vision-language models with specialized, distilled memory adapters. This allows accurate local memory operations and native visual understanding, outperforming larger baselines in text and visual ...
🔹 Publication Date: Published on Dec 4
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.04763
• PDF: https://arxiv.org/pdf/2512.04763
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#OnDeviceAI #LLMs #VLMs #AIAdapters #MemoryAugmentedAI
❤1
🤖🧠 How to Run and Fine-Tune Kimi K2 Thinking Locally with Unsloth
🗓️ 11 Dec 2025
📚 AI News & Trends
The demand for efficient and powerful large language models (LLMs) continues to rise as developers and researchers seek new ways to optimize reasoning, coding, and conversational AI performance. One of the most impressive open-source AI systems available today is Kimi K2 Thinking, created by Moonshot AI. Through collaboration with Unsloth, users can now fine-tune and ...
#KimiK2Thinking #Unsloth #LLMs #LargeLanguageModels #AI #FineTuning
🗓️ 11 Dec 2025
📚 AI News & Trends
The demand for efficient and powerful large language models (LLMs) continues to rise as developers and researchers seek new ways to optimize reasoning, coding, and conversational AI performance. One of the most impressive open-source AI systems available today is Kimi K2 Thinking, created by Moonshot AI. Through collaboration with Unsloth, users can now fine-tune and ...
#KimiK2Thinking #Unsloth #LLMs #LargeLanguageModels #AI #FineTuning
❤1
✨Thinking with Images via Self-Calling Agent
📝 Summary:
sCoT is a novel visual reasoning paradigm that reformulates interleaved multimodal CoT as a language-only CoT with self-calling subagents. It improves reasoning performance and efficiency by avoiding explicit multimodal interleaving and using group-relative policy optimization.
🔹 Publication Date: Published on Dec 9
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.08511
• PDF: https://arxiv.org/pdf/2512.08511
• Github: https://github.com/YWenxi/think-with-images-through-self-calling
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#VisualReasoning #MultimodalAI #LLMs #AIagents #AIResearch
📝 Summary:
sCoT is a novel visual reasoning paradigm that reformulates interleaved multimodal CoT as a language-only CoT with self-calling subagents. It improves reasoning performance and efficiency by avoiding explicit multimodal interleaving and using group-relative policy optimization.
🔹 Publication Date: Published on Dec 9
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.08511
• PDF: https://arxiv.org/pdf/2512.08511
• Github: https://github.com/YWenxi/think-with-images-through-self-calling
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#VisualReasoning #MultimodalAI #LLMs #AIagents #AIResearch
✨Sliding Window Attention Adaptation
📝 Summary:
Sliding Window Attention Adaptation SWAA allows pretrained LLMs to use efficient sliding window attention for long contexts without retraining. SWAA combines five adaptation methods, with specific synergistic combinations effectively recovering original long-context performance.
🔹 Publication Date: Published on Dec 11
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.10411
• PDF: https://arxiv.org/pdf/2512.10411
🔹 Models citing this paper:
• https://huggingface.co/yuyijiong/Qwen3-SWA-adaptation
✨ Datasets citing this paper:
• https://huggingface.co/datasets/yuyijiong/LongMemEval_24k
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#LLMs #SlidingWindowAttention #LongContextAI #NLP #AIResearch
📝 Summary:
Sliding Window Attention Adaptation SWAA allows pretrained LLMs to use efficient sliding window attention for long contexts without retraining. SWAA combines five adaptation methods, with specific synergistic combinations effectively recovering original long-context performance.
🔹 Publication Date: Published on Dec 11
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.10411
• PDF: https://arxiv.org/pdf/2512.10411
🔹 Models citing this paper:
• https://huggingface.co/yuyijiong/Qwen3-SWA-adaptation
✨ Datasets citing this paper:
• https://huggingface.co/datasets/yuyijiong/LongMemEval_24k
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#LLMs #SlidingWindowAttention #LongContextAI #NLP #AIResearch
❤2
✨Causal Judge Evaluation: Calibrated Surrogate Metrics for LLM Systems
📝 Summary:
CJE improves LLM-as-judge evaluation by fixing statistical issues like uncalibrated scores and poor confidence intervals. It achieves 99% ranking accuracy at 14x lower cost by calibrating a cheaper judge with 5% oracle labels.
🔹 Publication Date: Published on Dec 11
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.11150
• PDF: https://arxiv.org/pdf/2512.11150
• Project Page: https://www.cimolabs.com/cje
• Github: https://github.com/cimo-labs/cje
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#LLMs #AIEvaluation #MachineLearning #DataScience #NLP
📝 Summary:
CJE improves LLM-as-judge evaluation by fixing statistical issues like uncalibrated scores and poor confidence intervals. It achieves 99% ranking accuracy at 14x lower cost by calibrating a cheaper judge with 5% oracle labels.
🔹 Publication Date: Published on Dec 11
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.11150
• PDF: https://arxiv.org/pdf/2512.11150
• Project Page: https://www.cimolabs.com/cje
• Github: https://github.com/cimo-labs/cje
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#LLMs #AIEvaluation #MachineLearning #DataScience #NLP
✨VOYAGER: A Training Free Approach for Generating Diverse Datasets using LLMs
📝 Summary:
Voyager is a novel, training-free method that iteratively generates diverse synthetic datasets from LLMs. It uses determinantal point processes to optimize diversity, significantly outperforming baselines with a 1.5-3x improvement.
🔹 Publication Date: Published on Dec 12
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.12072
• PDF: https://arxiv.org/pdf/2512.12072
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#LLMs #SyntheticData #DataScience #MachineLearning #AI
📝 Summary:
Voyager is a novel, training-free method that iteratively generates diverse synthetic datasets from LLMs. It uses determinantal point processes to optimize diversity, significantly outperforming baselines with a 1.5-3x improvement.
🔹 Publication Date: Published on Dec 12
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.12072
• PDF: https://arxiv.org/pdf/2512.12072
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#LLMs #SyntheticData #DataScience #MachineLearning #AI
❤2
✨FiNERweb: Datasets and Artifacts for Scalable Multilingual Named Entity Recognition
📝 Summary:
FiNERweb is a new pipeline that scales multilingual Named Entity Recognition dataset creation to 91 languages using LLMs. It produces 225k high-quality passages, enabling models to achieve comparable or improved zero-shot performance with 19x less data.
🔹 Publication Date: Published on Dec 15
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.13884
• PDF: https://arxiv.org/pdf/2512.13884
• Github: https://github.com/whoisjones/FiNERweb
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#NER #NLP #LLMs #MultilingualAI #Datasets
📝 Summary:
FiNERweb is a new pipeline that scales multilingual Named Entity Recognition dataset creation to 91 languages using LLMs. It produces 225k high-quality passages, enabling models to achieve comparable or improved zero-shot performance with 19x less data.
🔹 Publication Date: Published on Dec 15
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.13884
• PDF: https://arxiv.org/pdf/2512.13884
• Github: https://github.com/whoisjones/FiNERweb
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#NER #NLP #LLMs #MultilingualAI #Datasets
❤1
✨JustRL: Scaling a 1.5B LLM with a Simple RL Recipe
📝 Summary:
JustRL uses a minimal single-stage RL approach with fixed hyperparameters to achieve state-of-the-art performance on 1.5B reasoning models. It uses less compute and shows stable training, suggesting that complex RL methods for LLMs may be unnecessary and can even hinder exploration.
🔹 Publication Date: Published on Dec 18
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.16649
• PDF: https://arxiv.org/pdf/2512.16649
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#ReinforcementLearning #LLMs #DeepLearning #AIResearch #ModelScaling
📝 Summary:
JustRL uses a minimal single-stage RL approach with fixed hyperparameters to achieve state-of-the-art performance on 1.5B reasoning models. It uses less compute and shows stable training, suggesting that complex RL methods for LLMs may be unnecessary and can even hinder exploration.
🔹 Publication Date: Published on Dec 18
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.16649
• PDF: https://arxiv.org/pdf/2512.16649
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#ReinforcementLearning #LLMs #DeepLearning #AIResearch #ModelScaling
❤1
✨Hearing to Translate: The Effectiveness of Speech Modality Integration into LLMs
📝 Summary:
This paper benchmarks SpeechLLMs against cascaded systems for speech-to-text translation. It finds cascaded systems are more reliable overall, while SpeechLLMs match them only in select cases. Integrating an LLM is essential for high quality speech translation.
🔹 Publication Date: Published on Dec 18
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.16378
• PDF: https://arxiv.org/pdf/2512.16378
• Github: https://github.com/sarapapi/hearing2translate
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#SpeechTranslation #LLMs #NLP #AIResearch #DeepLearning
📝 Summary:
This paper benchmarks SpeechLLMs against cascaded systems for speech-to-text translation. It finds cascaded systems are more reliable overall, while SpeechLLMs match them only in select cases. Integrating an LLM is essential for high quality speech translation.
🔹 Publication Date: Published on Dec 18
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.16378
• PDF: https://arxiv.org/pdf/2512.16378
• Github: https://github.com/sarapapi/hearing2translate
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#SpeechTranslation #LLMs #NLP #AIResearch #DeepLearning
❤1