How to load this model?

#1
by Frz614 - opened

It seems that I can only load the quantized model by using vllm. I need to use "AutoFP8ForCausalLM.from_pretrained(local_model_path, quantize_config=quantize_config, local_files_only=True)" to load the the quantized model because I want to modify the quantize.py, but there is something wrong:" ValueError: Unknown quantization type, got fp8 - supported types are: ['awq', 'bitsandbytes_4bit', 'bitsandbytes_8bit', 'gptq', 'aqlm', 'quanto', 'eetq', 'hqq']". It looks like the "BaseQuantizeConfig" class is not acceptable. Is there a way to load the model so I can modify the model file?

Neural Magic org

Hi @Frz614 there is no way to load already quantized checkpoints back into AutoFP8 at the moment. vLLM is the intended place for inference.

Sign up or log in to comment