v000000's picture
Update README.md
201d517 verified
|
raw
history blame
3.62 kB
---
base_model:
- nothingiisreal/L3.1-8B-Celeste-V1.5
- Sao10K/Llama-3.1-8B-Stheno-v3.4
- Sao10K/L3.1-8B-Niitama-v1.1
- arcee-ai/Llama-3.1-SuperNova-Lite
- akjindal53244/Llama-3.1-Storm-8B
- arcee-ai/Llama-Spark
- grimjim/Llama-3-Instruct-abliteration-LoRA-8B
- crestf411/sunfall-peft
- v000000/L3.1-Celestial-Stone-2x8B
library_name: transformers
tags:
- merge
- llama
- mixtral
- dpo
datasets:
- jondurbin/gutenberg-dpo-v0.1
---
> [!WARNING]
> **Sampler:**<br>
> Likes a low temperature due to the MoE architecture. I use 0.3 personally.
# Llama-3.1-Celestial-Stone-2x8B-DPO (BF16)
* *DPO Trained, Mixture of Experts (14B).*
![image/png](https://cdn-uploads.huggingface.co/production/uploads/64f74b6e6389380c77562762/lyRa7z5maTqAaa43sxC2J.png)
* <b>2x Experts working together per token, Gutenberg novelwriting finetuning.</b>
*List of llama.cpp repos*
# Thanks mradermacher (GGUF):
* [GGUF static](https://huggingface.co/mradermacher/L3.1-Celestial-Stone-2x8B-DPO-GGUF)
* [GGUF Imatrix](https://huggingface.co/mradermacher/L3.1-Celestial-Stone-2x8B-DPO-i1-GGUF)
# Thanks QuantFactory (GGUF):
* [GGUF static](https://huggingface.co/QuantFactory/L3.1-Celestial-Stone-2x8B-DPO-GGUF)
# Thanks Triangle104 (GGUF):
* [Q8_0](https://huggingface.co/Triangle104/L3.1-Celestial-Stone-2x8B-DPO-Q8_0-GGUF)
* [Q6_K](https://huggingface.co/Triangle104/L3.1-Celestial-Stone-2x8B-DPO-Q6_K-GGUF)
* [Q5_K_M](https://huggingface.co/Triangle104/L3.1-Celestial-Stone-2x8B-DPO-Q5_K_M-GGUF)
* [Q5_K_S](https://huggingface.co/Triangle104/L3.1-Celestial-Stone-2x8B-DPO-Q5_K_S-GGUF)
* [Q4_K_M](https://huggingface.co/Triangle104/L3.1-Celestial-Stone-2x8B-DPO-Q4_K_M-GGUF)
* [Q4_K_S](https://huggingface.co/Triangle104/L3.1-Celestial-Stone-2x8B-DPO-Q4_K_S-GGUF)
---------------------------------------------------------------------------------
[L3.1-Celestial-Stone-2x8B](https://huggingface.co/v000000/L3.1-Celestial-Stone-2x8B) Finetuned on Nvidia A100. (See Base Model card for additional details.)
0.5 Epoch completed of dataset [jondurbin/gutenberg-dpo-v0.1](https://huggingface.co/datasets/jondurbin/gutenberg-dpo-v0.1) with learning_rate=8e-6
Result seems pretty good even with half epoch and low learning rate, the effect is smoother and less pronounced.
Outputs are more compliant and verbose, less sloppy and safety aligned.
------------------------------------------------------------------------------
*The first expert* is Instruct 405B distillation/RP vector merge <b>(Supernova-Lite, Niitama1.1, Storm)</b>
*The second expert* is ERP/Reddit data merge <b>(Celeste1.5, Stheno3.4, Storm)</b>
-------------------------------------------------------------------------------
*The base model* is <b>Sao10k/L3.1-Stheno-3.4</b> with the <b>Sunfall LoRa 0.6.1</b> to make it understand SillyTavern prompts and storywriting better.
-------------------------------------------------------------------------------
*Resultant merge finetuned* on [jondurbin/gutenberg-dpo-v0.1](https://huggingface.co/datasets/jondurbin/gutenberg-dpo-v0.1).
# Prompt Template:
```bash
<|begin_of_text|><|start_header_id|>system<|end_header_id|>
{system_prompt}<|eot_id|><|start_header_id|>user<|end_header_id|>
{input}<|eot_id|><|start_header_id|>assistant<|end_header_id|>
{output}<|eot_id|>
```
*Sometimes has false refusals but swiping and "uncensored" prompts work. I have no idea why this happens tbh, since none of the base models exhibit this behavior at all, probably because of the custom Mixtral architecture's attention. But it's still pretty good imo.*
*For Llama.cpp/LMStudio/etc Make sure "num_experts_used = 2"*