File size: 3,970 Bytes

8bf981a
 
8fdd38b
 
 
 
 
 
 
 
8bf981a
 
 
 
920c604
 
 
b6e9e72
 
8bf981a
 
8fdd38b
c21bc29
 
8fdd38b
 
 
685be58
8bf981a
f961357
 
de054df
8bf981a
8769231
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
de054df
619718f
de054df
 
aceb769
 
de054df
 
619718f
aceb769
619718f
de054df
619718f
 
 
 
 
 
 
a1e70ae
e731507
 
c062d89
e007505
f961357
 
201d517
e75200c
8fdd38b
 
d063352
de054df
 
966394d
8fdd38b
 
 
 
 
 
 
 
 
 
 
 
de054df
 
066cafd
 
36f8de7

---
base_model:
- nothingiisreal/L3.1-8B-Celeste-V1.5
- Sao10K/Llama-3.1-8B-Stheno-v3.4
- Sao10K/L3.1-8B-Niitama-v1.1
- arcee-ai/Llama-3.1-SuperNova-Lite
- akjindal53244/Llama-3.1-Storm-8B
- arcee-ai/Llama-Spark
- grimjim/Llama-3-Instruct-abliteration-LoRA-8B
- crestf411/sunfall-peft
- v000000/L3.1-Celestial-Stone-2x8B
library_name: transformers
tags:
- merge
- llama
- mixtral
- dpo
datasets:
- jondurbin/gutenberg-dpo-v0.1
---

> [!WARNING]
> **Sampler:**<br>
> Likes a low temperature due to the MoE architecture. I use 0.3 personally.

# Llama-3.1-Celestial-Stone-2x8B-DPO (BF16)

* *DPO Trained, Mixture of Experts (14B).*

![image/png](https://cdn-uploads.huggingface.co/production/uploads/64f74b6e6389380c77562762/lyRa7z5maTqAaa43sxC2J.png)

* <b>2x Experts working together per token, Gutenberg novelwriting finetuning.</b>

------------------------------------------------------------------------------

*The first expert* is Instruct 405B distillation/RP vector merge <b>(Supernova-Lite, Niitama1.1, Storm)</b>

*The second expert* is ERP/Reddit data merge <b>(Celeste1.5, Stheno3.4, Storm)</b>

-------------------------------------------------------------------------------

*The base model* is <b>Sao10k/L3.1-Stheno-3.4</b> with the <b>Sunfall LoRa 0.6.1</b> to make it understand SillyTavern prompts and storywriting better.

-------------------------------------------------------------------------------

*Resultant merge finetuned* on [jondurbin/gutenberg-dpo-v0.1](https://huggingface.co/datasets/jondurbin/gutenberg-dpo-v0.1).

-------------------------------------------------------------------------------

*List of llama.cpp repos*

# Thanks mradermacher (GGUF):

* [GGUF static Q2-Q8](https://huggingface.co/mradermacher/L3.1-Celestial-Stone-2x8B-DPO-GGUF)
* [GGUF Imatrix Q2-Q6](https://huggingface.co/mradermacher/L3.1-Celestial-Stone-2x8B-DPO-i1-GGUF)

# Thanks QuantFactory (GGUF):

* [GGUF static Q2-Q8](https://huggingface.co/QuantFactory/L3.1-Celestial-Stone-2x8B-DPO-GGUF)

# Thanks Triangle104 (GGUF):

* [Q8_0](https://huggingface.co/Triangle104/L3.1-Celestial-Stone-2x8B-DPO-Q8_0-GGUF)
* [Q6_K](https://huggingface.co/Triangle104/L3.1-Celestial-Stone-2x8B-DPO-Q6_K-GGUF)
* [Q5_K_M](https://huggingface.co/Triangle104/L3.1-Celestial-Stone-2x8B-DPO-Q5_K_M-GGUF)
* [Q5_K_S](https://huggingface.co/Triangle104/L3.1-Celestial-Stone-2x8B-DPO-Q5_K_S-GGUF)
* [Q4_K_M](https://huggingface.co/Triangle104/L3.1-Celestial-Stone-2x8B-DPO-Q4_K_M-GGUF)
* [Q4_K_S](https://huggingface.co/Triangle104/L3.1-Celestial-Stone-2x8B-DPO-Q4_K_S-GGUF)

*Other*

* [GGUF Imatrix IQ4-Q8](https://huggingface.co/v000000/L3.1-Celestial-Stone-2x8B-DPO-GGUFs-IMATRIX)

---------------------------------------------------------------------------------

[L3.1-Celestial-Stone-2x8B](https://huggingface.co/v000000/L3.1-Celestial-Stone-2x8B) Finetuned on Nvidia A100. (See Base Model card for additional details.)

0.5 Epoch completed of dataset [jondurbin/gutenberg-dpo-v0.1](https://huggingface.co/datasets/jondurbin/gutenberg-dpo-v0.1) with learning_rate=8e-6

Result seems pretty good even with half epoch and low learning rate, the effect is smoother and less pronounced but its probably not *optimal*.

Outputs are more compliant and verbose, less sloppy and safety aligned.

------------------------------------------------------------------------------

# Prompt Template:
```bash
<|begin_of_text|><|start_header_id|>system<|end_header_id|>

{system_prompt}<|eot_id|><|start_header_id|>user<|end_header_id|>

{input}<|eot_id|><|start_header_id|>assistant<|end_header_id|>

{output}<|eot_id|>

```

*Sometimes has false refusals but swiping and "uncensored" prompts work. I have no idea why this happens tbh, since none of the base models exhibit this behavior, it seems to be a random emergence, and extra abliteration has no impact? gating method has no impact.* 
*But it's still pretty good imo.*

*For Llama.cpp/LMStudio/etc Make sure "num_experts_used = 2"*