TheBloke
/

orca_mini_v2_13b-GPTQ

Text Generation

text-generation-inference

4-bit precision

Model card Files Files and versions Community

TheBloke commited on Jul 9, 2023

Commit

7038f7d

•

1 Parent(s): 0293d67

Update README.md

Files changed (1) hide show

README.md +11 -5

README.md CHANGED Viewed

@@ -1,6 +1,12 @@
 ---
 inference: false
-license: other
 ---
 <!-- header start -->
@@ -89,17 +95,17 @@ model = AutoGPTQForCausalLM.from_quantized(model_name_or_path,
         quantize_config=None)
 prompt = "Tell me about AI"
 prompt_template=f'''### System:
 You are an AI assistant that follows instruction extremely well. Help as much as you can.
 ### User:
-prompt
 ### Input:
-input, if required
 ### Response:
 '''
 print("\n\n*** Generate:")
@@ -139,7 +145,7 @@ It was created with group_size 128 to increase inference accuracy, but without -
 * `orca_mini_v2_13b-GPTQ-4bit-128g.no-act.order.safetensors`
   * Works with AutoGPTQ in CUDA or Triton modes.
-  * [ExLlama](https://github.com/turboderp/exllama) suupports Llama 4-bit GPTQs, and will provide 2x speedup over AutoGPTQ and GPTQ-for-LLaMa.
   * Works with GPTQ-for-LLaMa in CUDA mode.  May have issues with GPTQ-for-LLaMa Triton mode.
   * Works with text-generation-webui, including one-click-installers.
   * Parameters: Groupsize = 128. Act Order / desc_act = False.

 ---
 inference: false
+license: cc-by-nc-sa-4.0
+language:
+- en
+library_name: transformers
+pipeline_tag: text-generation
+datasets:
+- psmathur/orca_minis_uncensored_dataset
 ---
 <!-- header start -->
         quantize_config=None)
 prompt = "Tell me about AI"
+input = ""
 prompt_template=f'''### System:
 You are an AI assistant that follows instruction extremely well. Help as much as you can.
 ### User:
+{prompt}
 ### Input:
+{input}
 ### Response:
 '''
 print("\n\n*** Generate:")
 * `orca_mini_v2_13b-GPTQ-4bit-128g.no-act.order.safetensors`
   * Works with AutoGPTQ in CUDA or Triton modes.
+  * [ExLlama](https://github.com/turboderp/exllama) supports Llama 4-bit GPTQs, and will provide 2x speedup over AutoGPTQ and GPTQ-for-LLaMa.
   * Works with GPTQ-for-LLaMa in CUDA mode.  May have issues with GPTQ-for-LLaMa Triton mode.
   * Works with text-generation-webui, including one-click-installers.
   * Parameters: Groupsize = 128. Act Order / desc_act = False.