How to use quantized version?

#2
by jawad1347 - opened

Kindly write code to use it in colab loading it with 4bit quants. Thanks

Sure, loading it with 4bit quant can be used by BitsAndBytes

            from transformers import BitsAndBytesConfig
            # Ref: https://huggingface.co/blog/4bit-transformers-bitsandbytes
            quantization_config = BitsAndBytesConfig(
                load_in_4bit=True,
                bnb_4bit_quant_type="nf4",
                bnb_4bit_use_double_quant=True,
                bnb_4bit_compute_dtype=torch.bfloat16,
            )
            model = AutoModel.from_pretrained(
                'Salesforce/SFR-Embedding-2_R',
                device_map='auto',
                trust_remote_code=True,
                quantization_config=quantization_config,
                **model_kwargs,
            )

can be quantized to gptq o awq? or those format will not be compatible with this arch ?

Salesforce org
edited 7 days ago

Hi @prudant ,

Of course, it can be used by other quantization methods.

Sign up or log in to comment