yentinglin/Taiwan-LLM-13B-v2.0-chat-awq · awq quantization method

Jul 16

I'm curious of what dataset that is used for teh awq qantization. Do you just follow the steps in the (https://github.com/casper-hansen/AutoAWQ/blob/main/examples/quantize.py) which is using the default dataset (mit-han-lab/pile-val-backup https://huggingface.co/datasets/mit-han-lab/pile-val-backup)?

from awq import AutoAWQForCausalLM
from transformers import AutoTokenizer

model_path = 'mistralai/Mistral-7B-Instruct-v0.2'
quant_path = 'mistral-instruct-v0.2-awq'
quant_config = { "zero_point": True, "q_group_size": 128, "w_bit": 4, "version": "GEMM" }

# Load model
model = AutoAWQForCausalLM.from_pretrained(
    model_path, **{"low_cpu_mem_usage": True, "use_cache": False}
)
tokenizer = AutoTokenizer.from_pretrained(model_path, trust_remote_code=True)

# Quantize
model.quantize(tokenizer, quant_config=quant_config)

# Save quantized model
model.save_quantized(quant_path)
tokenizer.save_pretrained(quant_path)

print(f'Model is quantized and saved at "{quant_path}"')

The reason for asking this question is that I want to quantize the lastest model yentinglin/Llama-3-Taiwan-70B-Instruct myself.

yentinglin

Owner Jul 16

same code but i use https://huggingface.co/datasets/yentinglin/TaiwanChat for calibration dataset

Nelson365487

Jul 16

Thank you for your prompt reply, I did not expect it at all :D .

Nelson365487 changed discussion status to closed Jul 16