How to use quantized version?

by jawad1347 - opened 17 days ago

Discussion

jawad1347

17 days ago

Kindly write code to use it in colab loading it with 4bit quants. Thanks

yliu279

Salesforce org 13 days ago

•

edited 13 days ago

Sure, loading it with 4bit quant can be used by BitsAndBytes

            from transformers import BitsAndBytesConfig
            # Ref: https://huggingface.co/blog/4bit-transformers-bitsandbytes
            quantization_config = BitsAndBytesConfig(
                load_in_4bit=True,
                bnb_4bit_quant_type="nf4",
                bnb_4bit_use_double_quant=True,
                bnb_4bit_compute_dtype=torch.bfloat16,
            )
            model = AutoModel.from_pretrained(
                'Salesforce/SFR-Embedding-2_R',
                device_map='auto',
                trust_remote_code=True,
                quantization_config=quantization_config,
                **model_kwargs,
            )

prudant

11 days ago

can be quantized to gptq o awq? or those format will not be compatible with this arch ?

yliu279

Salesforce org 7 days ago

•

edited 7 days ago

Hi @prudant ,

Of course, it can be used by other quantization methods.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment