TheBloke/Wizard-Vicuna-30B-Uncensored-GPTQ · size mismatch for model.embed_tokens.weight

Jun 14, 2023

•

edited Jun 14, 2023

Hi there, I tried to load this model using oobabooga webui, following the Readme file for this model.
Followed all instructions in the readme, and I get the following error/traceback:

Traceback (most recent call last): File “I:\Deep-learning\chat\oogabooga\oogabooga\text-generation-webui\server.py”, line 71, in load_model_wrapper shared.model, shared.tokenizer = load_model(shared.model_name) File “I:\Deep-learning\chat\oogabooga\oogabooga\text-generation-webui\modules\models.py”, line 95, in load_model output = load_func(model_name) File “I:\Deep-learning\chat\oogabooga\oogabooga\text-generation-webui\modules\models.py”, line 289, in GPTQ_loader model = modules.GPTQ_loader.load_quantized(model_name) File “I:\Deep-learning\chat\oogabooga\oogabooga\text-generation-webui\modules\GPTQ_loader.py”, line 177, in load_quantized model = load_quant(str(path_to_model), str(pt_path), shared.args.wbits, shared.args.groupsize, kernel_switch_threshold=threshold) File “I:\Deep-learning\chat\oogabooga\oogabooga\text-generation-webui\modules\GPTQ_loader.py”, line 84, in _load_quant model.load_state_dict(safe_load(checkpoint), strict=False) File “I:\Deep-learning\chat\oogabooga\oogabooga\installer_files\env\lib\site-packages\torch\nn\modules\module.py”, line 2041, in load_state_dict raise RuntimeError(‘Error(s) in loading state_dict for {}:\n\t{}’.format( RuntimeError: Error(s) in loading state_dict for LlamaForCausalLM: size mismatch for model.embed_tokens.weight: copying a param with shape torch.Size([32001, 6656]) from checkpoint, the shape in current model is torch.Size([32000, 6656]). size mismatch for lm_head.weight: copying a param with shape torch.Size([32001, 6656]) from checkpoint, the shape in current model is torch.Size([32000, 6656]).

I have been able to load the 13B version of this model, but not this 30B model. If someone could please help me figure out what is causing this, and how to fix it, I'd appreciate that very much.
P.s. I noticed another person posted about a similar sounding issue "size mismatch", to which the solution was setting the group size to 0. I have already set mine to 0 (and was sure to save the model settings before reloading it) and I still get this error.

TheBloke

Owner Jun 16, 2023

Latest text-generation-webui now uses AutoGPTQ which automatically sets GPTQ parameters.

On some of my older models I've not yet updated the README to explain that. I have just updated this README, so please re-read the section "How to easily download and use.." to see the new instructions.

In short: all GPTQ parameters should be set to default values. This will then load the model using AutoGPTQ, and these parameters will be loaded from the file quantize_config.json. If you set any GPTQ parameters manually it will override these settings, and possibly give it a bad setting.

damsair

Jun 17, 2023

Latest text-generation-webui now uses AutoGPTQ which automatically sets GPTQ parameters.

On some of my older models I've not yet updated the README to explain that. I have just updated this README, so please re-read the section "How to easily download and use.." to see the new instructions.

In short: all GPTQ parameters should be set to default values. This will then load the model using AutoGPTQ, and these parameters will be loaded from the file quantize_config.json. If you set any GPTQ parameters manually it will override these settings, and possibly give it a bad setting.

I have the same error, can load the 13B version too without any trouble. Doing what you mentioned above doesn't change anything for me sadly.

damsair

Jun 17, 2023

And now i get this error :

"Traceback (most recent call last): File “D:\SD\GPT\oobabooga_windows\text-generation-webui\server.py”, line 73, in load_model_wrapper shared.model, shared.tokenizer = load_model(shared.model_name, loader) File “D:\SD\GPT\oobabooga_windows\text-generation-webui\modules\models.py”, line 65, in load_model output = load_func_maploader File “D:\SD\GPT\oobabooga_windows\text-generation-webui\modules\models.py”, line 271, in AutoGPTQ_loader return modules.AutoGPTQ_loader.load_quantized(model_name) File “D:\SD\GPT\oobabooga_windows\text-generation-webui\modules\AutoGPTQ_loader.py”, line 55, in load_quantized model = AutoGPTQForCausalLM.from_quantized(path_to_model, **params) File “D:\SD\GPT\oobabooga_windows\installer_files\env\lib\site-packages\auto_gptq\modeling\auto.py”, line 82, in from_quantized return quant_func( File “D:\SD\GPT\oobabooga_windows\installer_files\env\lib\site-packages\auto_gptq\modeling_base.py”, line 717, in from_quantized model = AutoModelForCausalLM.from_config( File “D:\SD\GPT\oobabooga_windows\installer_files\env\lib\site-packages\transformers\models\auto\auto_factory.py”, line 425, in from_config return model_class._from_config(config, **kwargs) File “D:\SD\GPT\oobabooga_windows\installer_files\env\lib\site-packages\transformers\modeling_utils.py”, line 1143, in _from_config model = cls(config, **kwargs) File “D:\SD\GPT\oobabooga_windows\installer_files\env\lib\site-packages\transformers\models\llama\modeling_llama.py”, line 615, in init self.model = LlamaModel(config) File “D:\SD\GPT\oobabooga_windows\installer_files\env\lib\site-packages\transformers\models\llama\modeling_llama.py”, line 446, in init self.layers = nn.ModuleList([LlamaDecoderLayer(config) for _ in range(config.num_hidden_layers)]) File “D:\SD\GPT\oobabooga_windows\installer_files\env\lib\site-packages\transformers\models\llama\modeling_llama.py”, line 446, in self.layers = nn.ModuleList([LlamaDecoderLayer(config) for _ in range(config.num_hidden_layers)]) File “D:\SD\GPT\oobabooga_windows\installer_files\env\lib\site-packages\transformers\models\llama\modeling_llama.py”, line 256, in init self.mlp = LlamaMLP( File “D:\SD\GPT\oobabooga_windows\installer_files\env\lib\site-packages\transformers\models\llama\modeling_llama.py”, line 149, in init self.gate_proj = nn.Linear(hidden_size, intermediate_size, bias=False) File “D:\SD\GPT\oobabooga_windows\installer_files\env\lib\site-packages\torch\nn\modules\linear.py”, line 96, in init self.weight = Parameter(torch.empty((out_features, in_features), **factory_kwargs)) RuntimeError: [enforce fail at C:\cb\pytorch_1000000000000\work\c10\core\impl\alloc_cpu.cpp:72] data. DefaultCPUAllocator: not enough memory: you tried to allocate 238551040 bytes."

Trying to load this one on a RTX 4090.

TheBloke

Owner Jun 17, 2023

@damsair

That error means that you need to increase the Windows Pagefile size. Either manually increase it to around 90GB in size, or else set it to Auto and make sure there's at least 90GB free on C: (or whatever drive the Pagefile is on)

With that done, the model should load.

This is a common issue for Windows users, especially with the larger models. Windows loads the model first into RAM before it goes to VRAM. And for some reason it requires that you have a large amount of Pagefile free for this purpose, even for users who have plenty of free RAM.

damsair

Jun 17, 2023

@TheBloke

Thanks for this, i set it to Auto (windows pagefile size), restarted, but still impossible to use this model. When i load it here is what i get :
2023-06-17 22:32:44 INFO:Loading TheBloke_Wizard-Vicuna-30B-Uncensored-GPTQ...
2023-06-17 22:32:44 INFO:The AutoGPTQ params are: {'model_basename': 'Wizard-Vicuna-30B-Uncensored-GPTQ-4bit.act.order', 'device': 'cuda:0', 'use_triton': False, 'inject_fused_attention': True, 'inject_fused_mlp': True, 'use_safetensors': True, 'trust_remote_code': False, 'max_memory': None, 'quantize_config': None}
2023-06-17 22:32:59 WARNING:The model weights are not tied. Please use the tie_weights method before using the infer_auto_device function.
2023-06-17 22:32:59 WARNING:The safetensors archive passed at models\TheBloke_Wizard-Vicuna-30B-Uncensored-GPTQ\Wizard-Vicuna-30B-Uncensored-GPTQ-4bit.act.order.safetensors does not contain metadata. Make sure to save your model with the save_pretrained method. Defaulting to 'pt' metadata.

And then nothing, the GUI doesn't work anymore.

TheBloke

Owner Jun 17, 2023

How long did you wait? Those messages indicate it's still loading the model. The next thing you'd see, once it's loaded, is

Loaded the model in X seconds.
Running on local URL:  http://0.0.0.0:7860

damsair

Jun 17, 2023

I waited several minutes, just after the error text I copied above, there appears a line saying "press any key to continue...". If I press any key, the window closes (and GUI show many red ERROR messages).

TheBloke

Owner Jun 17, 2023

OK, the 'press any key to continue' message means it still needs more Pagefile. Try increasing it further.

damsair

Jun 17, 2023

•

edited Jun 17, 2023

Thank you @TheBloke , it's finally working and i'm going to test all the 30B models on your page now :) it was the same error each time ! Thanks again i just learnt something !

TheBloke
/

Wizard-Vicuna-30B-Uncensored-GPTQ

size mismatch for model.embed_tokens.weight - model refuses to load.