How to fast inference with FP8

#2
by CCRss - opened

I wonder if there is an easy or not easy way to inference faster using FP8.

Neural Magic org

vLLM has native support for these FP8 checkpoints! https://docs.vllm.ai/en/latest/quantization/fp8.html

Sign up or log in to comment