Towards 100x Speedup: Full Stack Transformer Inference Optimization
yaofu.notion.site/Towards-10…
see also : Adversarial Attacks on LLMs
#GPU_architecture , #transformer_inference_basics , #memory_layout , #blockwise_decoding ,#LLM
yaofu.notion.site/Towards-10…
see also : Adversarial Attacks on LLMs
#GPU_architecture , #transformer_inference_basics , #memory_layout , #blockwise_decoding ,#LLM