Quantization

#30
by mrgiraffe - opened

Hello - I was wondering if there's a quantized version of the model that can be used to generate embeddings? Tried adding this model on an A10G instance and the GPU couldn't handle it. Thank you very much

You can quantize it yourself. See docss:

quantization_config = BitsAndBytesConfig(
        load_in_4bit=True,
        bnb_4bit_quant_type="nf4",
        bnb_4bit_compute_dtype=torch.float16,
)

tokenizer = AutoTokenizer.from_pretrained('intfloat/e5-mistral-7b-instruct')
model = AutoModel.from_pretrained('intfloat/e5-mistral-7b-instruct'
,torch_dtype=torch.float16
, attn_implementation="flash_attention_2"
, device_map="cuda"
, quantization_config=quantization_config
)

Sign up or log in to comment