Getting "KeyError" when loading model

#8
by tsakaiba - opened

I have built from source using pip install -q git+https://github.com/huggingface/transformers.git

When trying to load the model:
model = AutoModel.from_pretrained("nvidia/NV-Embed-v1",trust_remote_code=True,token=token)

I get the following exception:

KeyError Traceback (most recent call last)
Cell In[11], line 1
----> 1 model = AutoModel.from_pretrained("nvidia/NV-Embed-v1",
2 trust_remote_code=True,
3 token=token) KeyError: 'NVEmbedConfig'

Any hints?
Thank you

NVIDIA org

Thank you for reporting the issue. Can you try upgrading your transformers package? For example, upgrading the python packages as below,

pip uninstall -y transformer-engine
pip install torch==2.2.0
pip install transformers --upgrade
pip install flash-attn==2.2.0

Same error. KeyError: ‘NVEmbedConfig’. Have uninstalled and installed suggested libraries. Would like to use the model. Any suggestions are appreciated.

For me, passing token will have this issue, when i do
huggingface-cli login
This issue goes away after forcing re-download

@nada5
Could you please post the exact versions under which it works?
I use cuda version 11.8, V11.8.89 (this is fixed).
After the update, I have

sentence-transformers==2.7.0
transformers==4.41.2
torch==2.2.0
flash-attn==2.2.0

but then an ImportError occurs when trying to load "nvidia/NV-Embed-v1":

      4 import torch.nn as nn
      6 # isort: off
      7 # We need to import the CUDA kernels after importing torch
----> 8 import flash_attn_2_cuda as flash_attn_cuda
     10 # isort: on
     13 def _get_block_size(device, head_dim, is_dropout, is_causal):
     14     # This should match the block sizes in the CUDA kernel

When I just try to update flash-attn with !pip install --upgrade flash-attn --no-build-isolation to flash-attn==2.5.9.post1, I still get the same ImportError. When I downgrade to torch=2.1.2 (which works fine with other HF models), I am back to KeyError: 'NVEmbedConfig' .

I got it to work. For me it required a newer Cuda version, it worked with cuda_12.1.r12.1.

Sign up or log in to comment