File size: 888 Bytes
fd51c67
 
 
 
 
 
 
 
 
 
 
 
 
 
 
64782fb
8ab7e3a
fd51c67
3623a01
f162ad7
3623a01
64782fb
 
a6a8fdb
3623a01
fd51c67
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
---
tags:
- not-for-all-audiences
- nsfw
license: other
language:
- en
---

[EXL2](https://github.com/turboderp/exllamav2/tree/master#exllamav2) Quantization of [Undi95's's MXLewd-L2-20B](https://huggingface.co/Undi95/MXLewd-L2-20B).


## Model details

First attempt to quantize a 20B model so it can run on 16GB VRAM with the highest quality possible.
Quantized at 3.18bpw with hb 6. 8.13bpw also available for those who want it (exl2 is very fast with flash-attention and the quality is (almost) the same with fp16.)

Perplexity:

Base = 6.4744

8bpw h8 = 6.4471

3.18 h6 = 6.5705

Dataset = [wikitext](https://huggingface.co/datasets/wikitext/resolve/refs%2Fconvert%2Fparquet/wikitext-2-v1/test/0000.parquet)

## Prompt Format

```
Below is an instruction that describes a task. Write a response that appropriately completes the request.

### Instruction:
{prompt}

### Response:

```