neuralmagic/Meta-Llama-3-8B-Instruct-FP8-KV

It seems that I can only load the quantized model by using vllm. I need to use "AutoFP8ForCausalLM.from_pretrained(local_model_path, quantize_config=quantize_config, local_files_only=True)" to load the the quantized model because I want to modify the quantize.py, but there is something wrong:" ValueError: Unknown quantization type, got fp8 - supported types are: ['awq', 'bitsandbytes_4bit', 'bitsandbytes_8bit', 'gptq', 'aqlm', 'quanto', 'eetq', 'hqq']". It looks like the "BaseQuantizeConfig" class is not acceptable. Is there a way to load the model so I can modify the model file?

neuralmagic
/

Meta-Llama-3-8B-Instruct-FP8-KV

How to load this model?