Issues running model

#3
by blevlabs - opened

Since the model_basename is not originally provided in the example code, I tried this:

from transformers import AutoTokenizer, pipeline, logging
from auto_gptq import AutoGPTQForCausalLM, BaseQuantizeConfig
import argparse

model_name_or_path = "TheBloke/starcoderplus-GPTQ"
model_basename = "gptq_model-4bit--1g.safetensors"
use_triton = False

tokenizer = AutoTokenizer.from_pretrained(model_name_or_path, use_fast=True)

model = AutoGPTQForCausalLM.from_quantized(model_name_or_path,
        model_basename=model_basename,
        use_safetensors=True,
        trust_remote_code=True,
        device="cuda:0",
        use_triton=use_triton,
        quantize_config=None)

print("\n\n*** Generate:")

inputs = tokenizer.encode("def print_hello_world():", return_tensors="pt").to("cuda:0")
outputs = model.generate(inputs)
print(tokenizer.decode(outputs[0]))

But I always get the following:

FileNotFoundError: could not find model TheBloke/starcoderplus-GPTQ

When I remove the model_basename parameter, it downloads, but I get the following error with generate:

The safetensors archive passed at ~/.cache/huggingface/hub/models--TheBloke--starcoderplus-GPTQ/snapshots/aa67ff4fad65fc88f6281f3a2bcc0d648105ef96/gptq_model-4bit--1g.safetensors does not contain metadata. Make sure to save your model with the `save_pretrained` method. Defaulting to 'pt' metadata.


*** Generate:
TypeError: generate() takes 1 positional argument but 2 were given

I am just using the original code provided, with no other alterations. I am able to load other models from your HF repos with autogptq but not this one specifically

Hmm you shouldn't need model_basename for this. Maybe that's an AutoGPTQ bug.

When it is required, you leave out the .safetensors from the end, so it's
model_basename=gptq_model-4bit--1g

Thank you for the insight. Do you have an idea of why the generate() issue is occurring when I remove the model_basename? Here is my code when I do:

from transformers import AutoTokenizer, pipeline, logging
from auto_gptq import AutoGPTQForCausalLM, BaseQuantizeConfig
import argparse

model_name_or_path = "TheBloke/starcoderplus-GPTQ"
use_triton = False

tokenizer = AutoTokenizer.from_pretrained(model_name_or_path, use_fast=True)

model = AutoGPTQForCausalLM.from_quantized(model_name_or_path,
        use_safetensors=True,
        trust_remote_code=True,
        device="cuda:0",
        use_triton=use_triton,
        quantize_config=None)

print("\n\n*** Generate:")

inputs = tokenizer.encode("def print_hello_world():", return_tensors="pt").to("cuda:0")
outputs = model.generate(inputs)
print(tokenizer.decode(outputs[0]))

Screenshot_20230610_133041.png

As discussed on Discord:

This is caused by this bug: https://github.com/PanQiWei/AutoGPTQ/pull/135

Workaround is model.generate(inputs=inputs)

Fix: i just needed to set inputs=inputs on the generate command, TheBloke submitted a fix but did not have his MR accepted yet on the autogptq github.
Ensure the script uses the no-model-basename version I provided above.

blevlabs changed discussion status to closed

Sign up or log in to comment