✨LUT-LLM: Efficient Large Language Model Inference with Memory-based Computations on FPGAs
📝 Summary:
LUT-LLM is an FPGA accelerator for LLM inference that leverages on-chip memory to shift computation from arithmetic to memory-based operations via table lookups. This innovative approach achieves 1.66x lower latency than AMD MI210 and 1.72x higher energy efficiency than NVIDIA A100 for a 1.7B LLM.
🔹 Publication Date: Published on Nov 9
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.06174
• PDF: https://arxiv.org/pdf/2511.06174
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#LLM #FPGA #AI #DeepLearning #AIHardware
📝 Summary:
LUT-LLM is an FPGA accelerator for LLM inference that leverages on-chip memory to shift computation from arithmetic to memory-based operations via table lookups. This innovative approach achieves 1.66x lower latency than AMD MI210 and 1.72x higher energy efficiency than NVIDIA A100 for a 1.7B LLM.
🔹 Publication Date: Published on Nov 9
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.06174
• PDF: https://arxiv.org/pdf/2511.06174
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#LLM #FPGA #AI #DeepLearning #AIHardware
✨AutoNeural: Co-Designing Vision-Language Models for NPU Inference
📝 Summary:
AutoNeural is an NPU-native VLM co-designed for efficient edge inference. It uses a MobileNetV5-style vision backbone for stable integer quantization and a hybrid SSM-Transformer language backbone. This design reduces quantization errors and latency, improving real-time performance on edge devices.
🔹 Publication Date: Published on Dec 2
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.02924
• PDF: https://arxiv.org/pdf/2512.02924
🔹 Models citing this paper:
• https://huggingface.co/NexaAI/AutoNeural
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#AutoNeural #VisionLanguageModels #EdgeAI #AIHardware #EfficientAI
📝 Summary:
AutoNeural is an NPU-native VLM co-designed for efficient edge inference. It uses a MobileNetV5-style vision backbone for stable integer quantization and a hybrid SSM-Transformer language backbone. This design reduces quantization errors and latency, improving real-time performance on edge devices.
🔹 Publication Date: Published on Dec 2
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.02924
• PDF: https://arxiv.org/pdf/2512.02924
🔹 Models citing this paper:
• https://huggingface.co/NexaAI/AutoNeural
==================================
For more data science resources:
✓ https://t.iss.one/DataScienceT
#AutoNeural #VisionLanguageModels #EdgeAI #AIHardware #EfficientAI