Kooten
/

PsyMedRP-v1-20B-3bpw-h8-exl2

Text Generation

Not-For-All-Audiences

nsfw

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

Kooten commited on Oct 6, 2023

Commit

91f3a38

•

1 Parent(s): 3f35b83

Create README.md

Files changed (1) hide show

README.md +58 -0

README.md ADDED Viewed

	@@ -0,0 +1,58 @@

+---
+license: cc-by-nc-4.0
+tags:
+- not-for-all-audiences
+- nsfw
+---
+## Description
+Exllama 2 quant of [Undi95/PsyMedRP-v1-20B ](https://huggingface.co/Undi95/PsyMedRP-v1-20B)
+3 BPW, Head bit set to 8
+## VRAM
+My VRAM usage with 20B models are:
+| Bits per weight  | Context | VRAM  |
+|--|--|--|
+| 6bpw | 4k | 24gb |
+| 4bpw | 4k | 18gb |
+| 4bpw | 8k | 24gb |
+| 3bpw | 4k | 16gb |
+| 3bpw | 8k | 21gb |
+I have rounded up, these arent exact numbers, this is also on a windows machine.
+## Prompt template
+[Recommended reading](https://huggingface.co/lemonilia/LimaRP-Llama2-13B-v3-EXPERIMENT)
+You can follow these instruction format settings in SillyTavern. Replace `tiny` with
+your desired response length:
+![settings](https://files.catbox.moe/6lcz0u.png)
+### Message length control
+Inspired by the previously named "Roleplay" preset in SillyTavern, starting from this
+version of LimaRP it is possible to append a length modifier to the response instruction
+sequence, like this:
+```
+### Input
+User: {utterance}
+### Response: (length = medium)
+Character: {utterance}
+```
+This has an immediately noticeable effect on bot responses. The available lengths are:
+`tiny`, `short`, `medium`, `long`, `huge`, `humongous`, `extreme`, `unlimited`. **The
+recommended starting length is `medium`**. Keep in mind that the AI may ramble
+or impersonate the user with very long messages.
+The length control effect is reproducible, but the messages will not necessarily follow
+lengths very precisely, rather follow certain ranges on average, as seen in this table
+with data from tests made with one reply at the beginning of the conversation:
+![lengths](https://files.catbox.moe/dy39bt.png)
+Response length control appears to work well also deep into the conversation.