BramVanroy commited on
Commit
ff31f9e
1 Parent(s): f75574c

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +13 -10
README.md CHANGED
@@ -8,16 +8,19 @@ tags:
8
 
9
  This repository contains quantized versions of [BramVanroy/fietje-2b-instruct](https://huggingface.co/BramVanroy/fietje-2b-instruct):
10
 
11
- - `-f16` (5.6GB): best quality, but largest and slowest (recommended if you have the capacity, otherwise q8_0)
12
- - `-q8_0` (3.0GB): minimal quality loss, smaller
13
- - `-q5_k_m` (2.0GB): users have reported considerable quality loss in the chat `q5_k_m` version so you may want to avoid it
14
 
15
- Also available on ollama:
16
 
17
- ```sh
18
- # defaults to f16
19
- ollama run bramvanroy/fietje-2b-instruct
20
- ollama run bramvanroy/fietje-2b-instruct:f16
21
- ollama run bramvanroy/fietje-2b-instruct:q8_0
22
- ollama run bramvanroy/fietje-2b-instruct:q5_k_m
 
 
 
 
 
 
23
  ```
 
8
 
9
  This repository contains quantized versions of [BramVanroy/fietje-2b-instruct](https://huggingface.co/BramVanroy/fietje-2b-instruct):
10
 
 
 
 
11
 
12
+ Available quantization types and expected performance differences compared to base `f16`, higher perplexity=worse (from llama.cpp):
13
 
14
+ ```
15
+ Q3_K_M : 3.07G, +0.2496 ppl @ LLaMA-v1-7B
16
+ Q4_K_M : 3.80G, +0.0532 ppl @ LLaMA-v1-7B
17
+ Q5_K_M : 4.45G, +0.0122 ppl @ LLaMA-v1-7B
18
+ Q6_K : 5.15G, +0.0008 ppl @ LLaMA-v1-7B
19
+ Q8_0 : 6.70G, +0.0004 ppl @ LLaMA-v1-7B
20
+ F16 : 13.00G @ 7B
21
+ ```
22
+
23
+ Also available on [ollama](https://ollama.com/bramvanroy/fietje-2b-instruct).
24
+
25
+ Quants were made with release [`b2777`](https://github.com/ggerganov/llama.cpp/releases/tag/b2777) of llama.cpp.
26
  ```