TheBloke
/

samantha-1.1-llama-33B-GPTQ

Text Generation

text-generation-inference

4-bit precision

Model card Files Files and versions Community

TheBloke commited on Jun 11, 2023

Commit

a693e20

•

1 Parent(s): 4c1ae8d

Initial GPTQ model commit

Files changed (1) hide show

README.md +5 -5

README.md CHANGED Viewed

@@ -58,7 +58,7 @@ from auto_gptq import AutoGPTQForCausalLM, BaseQuantizeConfig
 import argparse
 model_name_or_path = "TheBloke/samantha-1.1-llama-33B-GPTQ"
-model_basename = "gptq_model-4bit-128g"
 use_triton = False
@@ -103,17 +103,17 @@ print(pipe(prompt_template)[0]['generated_text'])
 ## Provided files
-**gptq_model-4bit-128g.safetensors**
 This will work with AutoGPTQ and CUDA versions of GPTQ-for-LLaMa. There are reports of issues with Triton mode of recent GPTQ-for-LLaMa. If you have issues, please use AutoGPTQ instead.
-It was created with group_size 128 to increase inference accuracy, but without --act-order (desc_act) to increase compatibility and improve inference speed.
-* `gptq_model-4bit-128g.safetensors`
   * Works with AutoGPTQ in CUDA or Triton modes.
   * Works with GPTQ-for-LLaMa in CUDA mode.  May have issues with GPTQ-for-LLaMa Triton mode.
   * Works with text-generation-webui, including one-click-installers.
-  * Parameters: Groupsize = 128. Act Order / desc_act = False.
 <!-- footer start -->
 ## Discord

 import argparse
 model_name_or_path = "TheBloke/samantha-1.1-llama-33B-GPTQ"
+model_basename = "samantha-1.1-llama-33b-GPTQ-4bit--1g.act.order"
 use_triton = False
 ## Provided files
+**samantha-1.1-llama-33b-GPTQ-4bit--1g.act.order.safetensors**
 This will work with AutoGPTQ and CUDA versions of GPTQ-for-LLaMa. There are reports of issues with Triton mode of recent GPTQ-for-LLaMa. If you have issues, please use AutoGPTQ instead.
+It was created without group_size to lower VRAM requirements, and with --act-order (desc_act) to boost inference accuracy as much as possible.
+* `samantha-1.1-llama-33b-GPTQ-4bit--1g.act.order.safetensors`
   * Works with AutoGPTQ in CUDA or Triton modes.
   * Works with GPTQ-for-LLaMa in CUDA mode.  May have issues with GPTQ-for-LLaMa Triton mode.
   * Works with text-generation-webui, including one-click-installers.
+  * Parameters: Groupsize = -1. Act Order / desc_act = True.
 <!-- footer start -->
 ## Discord