Re-upload of GGML model due to issue with base model
Browse files
README.md
CHANGED
@@ -65,18 +65,18 @@ Refer to the Provided Files table below to see what files use which methods, and
|
|
65 |
## Provided files
|
66 |
| Name | Quant method | Bits | Size | Max RAM required | Use case |
|
67 |
| ---- | ---- | ---- | ---- | ---- | ----- |
|
68 |
-
| camel-13b-roleplay.ggmlv3.q2_K.bin | q2_K | 2 | 5.
|
69 |
-
| camel-13b-roleplay.ggmlv3.q3_K_L.bin | q3_K_L | 3 | 6.
|
70 |
-
| camel-13b-roleplay.ggmlv3.q3_K_M.bin | q3_K_M | 3 | 6.
|
71 |
-
| camel-13b-roleplay.ggmlv3.q3_K_S.bin | q3_K_S | 3 | 5.
|
72 |
| camel-13b-roleplay.ggmlv3.q4_0.bin | q4_0 | 4 | 7.32 GB | 9.82 GB | Original llama.cpp quant method, 4-bit. |
|
73 |
| camel-13b-roleplay.ggmlv3.q4_1.bin | q4_1 | 4 | 8.14 GB | 10.64 GB | Original llama.cpp quant method, 4-bit. Higher accuracy than q4_0 but not as high as q5_0. However has quicker inference than q5 models. |
|
74 |
-
| camel-13b-roleplay.ggmlv3.q4_K_M.bin | q4_K_M | 4 | 7.
|
75 |
-
| camel-13b-roleplay.ggmlv3.q4_K_S.bin | q4_K_S | 4 | 7.
|
76 |
| camel-13b-roleplay.ggmlv3.q5_0.bin | q5_0 | 5 | 8.95 GB | 11.45 GB | Original llama.cpp quant method, 5-bit. Higher accuracy, higher resource usage and slower inference. |
|
77 |
| camel-13b-roleplay.ggmlv3.q5_1.bin | q5_1 | 5 | 9.76 GB | 12.26 GB | Original llama.cpp quant method, 5-bit. Even higher accuracy, resource usage and slower inference. |
|
78 |
-
| camel-13b-roleplay.ggmlv3.q5_K_M.bin | q5_K_M | 5 | 9.
|
79 |
-
| camel-13b-roleplay.ggmlv3.q5_K_S.bin | q5_K_S | 5 | 8.
|
80 |
| camel-13b-roleplay.ggmlv3.q6_K.bin | q6_K | 6 | 10.68 GB | 13.18 GB | New k-quant method. Uses GGML_TYPE_Q8_K - 6-bit quantization - for all tensors |
|
81 |
| camel-13b-roleplay.ggmlv3.q8_0.bin | q8_0 | 8 | 13.83 GB | 16.33 GB | Original llama.cpp quant method, 8-bit. Almost indistinguishable from float16. High resource use and slow. Not recommended for most users. |
|
82 |
|
@@ -122,7 +122,7 @@ Donaters will get priority support on any and all AI/LLM/model questions and req
|
|
122 |
|
123 |
**Special thanks to**: Luke from CarbonQuill, Aemon Algiz, Dmitriy Samsonov.
|
124 |
|
125 |
-
**Patreon special mentions**:
|
126 |
|
127 |
Thank you to all my generous patrons and donaters!
|
128 |
|
|
|
65 |
## Provided files
|
66 |
| Name | Quant method | Bits | Size | Max RAM required | Use case |
|
67 |
| ---- | ---- | ---- | ---- | ---- | ----- |
|
68 |
+
| camel-13b-roleplay.ggmlv3.q2_K.bin | q2_K | 2 | 5.51 GB | 8.01 GB | New k-quant method. Uses GGML_TYPE_Q4_K for the attention.vw and feed_forward.w2 tensors, GGML_TYPE_Q2_K for the other tensors. |
|
69 |
+
| camel-13b-roleplay.ggmlv3.q3_K_L.bin | q3_K_L | 3 | 6.93 GB | 9.43 GB | New k-quant method. Uses GGML_TYPE_Q5_K for the attention.wv, attention.wo, and feed_forward.w2 tensors, else GGML_TYPE_Q3_K |
|
70 |
+
| camel-13b-roleplay.ggmlv3.q3_K_M.bin | q3_K_M | 3 | 6.31 GB | 8.81 GB | New k-quant method. Uses GGML_TYPE_Q4_K for the attention.wv, attention.wo, and feed_forward.w2 tensors, else GGML_TYPE_Q3_K |
|
71 |
+
| camel-13b-roleplay.ggmlv3.q3_K_S.bin | q3_K_S | 3 | 5.66 GB | 8.16 GB | New k-quant method. Uses GGML_TYPE_Q3_K for all tensors |
|
72 |
| camel-13b-roleplay.ggmlv3.q4_0.bin | q4_0 | 4 | 7.32 GB | 9.82 GB | Original llama.cpp quant method, 4-bit. |
|
73 |
| camel-13b-roleplay.ggmlv3.q4_1.bin | q4_1 | 4 | 8.14 GB | 10.64 GB | Original llama.cpp quant method, 4-bit. Higher accuracy than q4_0 but not as high as q5_0. However has quicker inference than q5 models. |
|
74 |
+
| camel-13b-roleplay.ggmlv3.q4_K_M.bin | q4_K_M | 4 | 7.87 GB | 10.37 GB | New k-quant method. Uses GGML_TYPE_Q6_K for half of the attention.wv and feed_forward.w2 tensors, else GGML_TYPE_Q4_K |
|
75 |
+
| camel-13b-roleplay.ggmlv3.q4_K_S.bin | q4_K_S | 4 | 7.37 GB | 9.87 GB | New k-quant method. Uses GGML_TYPE_Q4_K for all tensors |
|
76 |
| camel-13b-roleplay.ggmlv3.q5_0.bin | q5_0 | 5 | 8.95 GB | 11.45 GB | Original llama.cpp quant method, 5-bit. Higher accuracy, higher resource usage and slower inference. |
|
77 |
| camel-13b-roleplay.ggmlv3.q5_1.bin | q5_1 | 5 | 9.76 GB | 12.26 GB | Original llama.cpp quant method, 5-bit. Even higher accuracy, resource usage and slower inference. |
|
78 |
+
| camel-13b-roleplay.ggmlv3.q5_K_M.bin | q5_K_M | 5 | 9.23 GB | 11.73 GB | New k-quant method. Uses GGML_TYPE_Q6_K for half of the attention.wv and feed_forward.w2 tensors, else GGML_TYPE_Q5_K |
|
79 |
+
| camel-13b-roleplay.ggmlv3.q5_K_S.bin | q5_K_S | 5 | 8.97 GB | 11.47 GB | New k-quant method. Uses GGML_TYPE_Q5_K for all tensors |
|
80 |
| camel-13b-roleplay.ggmlv3.q6_K.bin | q6_K | 6 | 10.68 GB | 13.18 GB | New k-quant method. Uses GGML_TYPE_Q8_K - 6-bit quantization - for all tensors |
|
81 |
| camel-13b-roleplay.ggmlv3.q8_0.bin | q8_0 | 8 | 13.83 GB | 16.33 GB | Original llama.cpp quant method, 8-bit. Almost indistinguishable from float16. High resource use and slow. Not recommended for most users. |
|
82 |
|
|
|
122 |
|
123 |
**Special thanks to**: Luke from CarbonQuill, Aemon Algiz, Dmitriy Samsonov.
|
124 |
|
125 |
+
**Patreon special mentions**: Oscar Rangel, Eugene Pentland, Talal Aujan, Cory Kujawski, Luke, Asp the Wyvern, Ai Maven, Pyrater, Alps Aficionado, senxiiz, Willem Michiel, Junyu Yang, trip7s trip, Sebastain Graf, Joseph William Delisle, Lone Striker, Jonathan Leane, Johann-Peter Hartmann, David Flickinger, Spiking Neurons AB, Kevin Schuppel, Mano Prime, Dmitriy Samsonov, Sean Connelly, Nathan LeClaire, Alain Rossmann, Fen Risland, Derek Yates, Luke Pendergrass, Nikolai Manek, Khalefa Al-Ahmad, Artur Olbinski, John Detwiler, Ajan Kanaga, Imad Khwaja, Trenton Dambrowitz, Kalila, vamX, webtim, Illia Dulskyi.
|
126 |
|
127 |
Thank you to all my generous patrons and donaters!
|
128 |
|