ML Research Hub
32.6K subscribers
3.72K photos
180 videos
23 files
3.99K links
Advancing research in Machine Learning โ€“ practical insights, tools, and techniques for researchers.

Admin: @HusseinSheikho || @Hussein_Sheikho
Download Telegram
This media is not supported in your browser
VIEW IN TELEGRAM
๐Ÿฉต 940+ FPS Multi-Person Pose Estimation ๐Ÿ’›

๐Ÿ‘‰ RTMW (Real-Time Multi-person Whole-body pose estimation models) is a series of high-perf. models for 2D/3D body pose estimation. Over 940 FPS on #GPU! Code & models ๐Ÿ’™

๐ŸŸก Review: https://t.ly/XkBmg

๐ŸŸก Paper: arxiv.org/pdf/2407.08634

๐ŸŸก Repo: github.com/open-mmlab/mmpose/tree/main/projects/rtmpose

https://t.iss.one/DataScienceT ๐Ÿ†
Please open Telegram to view this post
VIEW IN TELEGRAM
๐Ÿ‘3โค2โคโ€๐Ÿ”ฅ2
PRIMA.CPP: Speeding Up 70B-Scale LLM Inference on Low-Resource Everyday Home Clusters

7 Apr 2025 ยท Zonghang Li, Tao Li, Wenjiao Feng, Mohsen Guizani, Hongfang Yu ยท

Emergency of DeepSeek R1 and QwQ 32B have broken through performance barriers for running frontier large language models (#LLMs) on home devices. While consumer hardware is getting stronger and model quantization is improving, existing end-side solutions still demand #GPU clusters, large RAM/VRAM, and high bandwidth, far beyond what a common home cluster can handle. This paper introduces prima.cpp, a distributed inference system that runs 70B-scale models on everyday home devices using a mix of CPU/GPU, low RAM/VRAM, Wi-Fi, and cross-platform support. It uses mmap to manage model weights and introduces piped-ring parallelism with prefetching to hide disk loading. By modeling heterogeneity in computation, communication, disk, memory (and its management behavior), and OS, it optimally assigns model layers to each device's #CPU and GPU, further reducing token latency. An elegant algorithm named Halda is proposed to solve this NP-hard assignment problem. We evaluate prima.cpp on a common four-node home cluster. It outperforms llama.cpp,# exo, and #dllama on 30B+ models while keeping memory pressure below 6%. This brings frontier 30B-70B models, such as #Llama 3, #DeepSeek R1, #Qwen 2.5, and #QwQ to home assistants, making advanced AI truly accessible to individuals. The code is open source and available at https://github.com/Lizonghang/prima.cpp.


Paper: https://arxiv.org/pdf/2504.08791v1.pdf

Code: https://github.com/lizonghang/prima.cpp

https://t.iss.one/DataScienceT โœ…
Please open Telegram to view this post
VIEW IN TELEGRAM
๐Ÿ‘5๐Ÿ‘2