L3.1-Celestial-Stone-2x8B-DPO / README.md

Update README.md

201d517 verified 4 days ago

3.62 kB

	---
	base_model:
	- nothingiisreal/L3.1-8B-Celeste-V1.5
	- Sao10K/Llama-3.1-8B-Stheno-v3.4
	- Sao10K/L3.1-8B-Niitama-v1.1
	- arcee-ai/Llama-3.1-SuperNova-Lite
	- akjindal53244/Llama-3.1-Storm-8B
	- arcee-ai/Llama-Spark
	- grimjim/Llama-3-Instruct-abliteration-LoRA-8B
	- crestf411/sunfall-peft
	- v000000/L3.1-Celestial-Stone-2x8B
	library_name: transformers
	tags:
	- merge
	- llama
	- mixtral
	- dpo
	datasets:
	- jondurbin/gutenberg-dpo-v0.1
	---

	> [!WARNING]
	> Sampler:<br>
	> Likes a low temperature due to the MoE architecture. I use 0.3 personally.

	# Llama-3.1-Celestial-Stone-2x8B-DPO (BF16)

	* DPO Trained, Mixture of Experts (14B).

	![image/png](https://cdn-uploads.huggingface.co/production/uploads/64f74b6e6389380c77562762/lyRa7z5maTqAaa43sxC2J.png)

	* <b>2x Experts working together per token, Gutenberg novelwriting finetuning.</b>

	List of llama.cpp repos

	# Thanks mradermacher (GGUF):

	* [GGUF static](https://huggingface.co/mradermacher/L3.1-Celestial-Stone-2x8B-DPO-GGUF)
	* [GGUF Imatrix](https://huggingface.co/mradermacher/L3.1-Celestial-Stone-2x8B-DPO-i1-GGUF)

	# Thanks QuantFactory (GGUF):

	* [GGUF static](https://huggingface.co/QuantFactory/L3.1-Celestial-Stone-2x8B-DPO-GGUF)

	# Thanks Triangle104 (GGUF):

	* [Q8_0](https://huggingface.co/Triangle104/L3.1-Celestial-Stone-2x8B-DPO-Q8_0-GGUF)
	* [Q6_K](https://huggingface.co/Triangle104/L3.1-Celestial-Stone-2x8B-DPO-Q6_K-GGUF)
	* [Q5_K_M](https://huggingface.co/Triangle104/L3.1-Celestial-Stone-2x8B-DPO-Q5_K_M-GGUF)
	* [Q5_K_S](https://huggingface.co/Triangle104/L3.1-Celestial-Stone-2x8B-DPO-Q5_K_S-GGUF)
	* [Q4_K_M](https://huggingface.co/Triangle104/L3.1-Celestial-Stone-2x8B-DPO-Q4_K_M-GGUF)
	* [Q4_K_S](https://huggingface.co/Triangle104/L3.1-Celestial-Stone-2x8B-DPO-Q4_K_S-GGUF)

	---------------------------------------------------------------------------------

	[L3.1-Celestial-Stone-2x8B](https://huggingface.co/v000000/L3.1-Celestial-Stone-2x8B) Finetuned on Nvidia A100. (See Base Model card for additional details.)

	0.5 Epoch completed of dataset [jondurbin/gutenberg-dpo-v0.1](https://huggingface.co/datasets/jondurbin/gutenberg-dpo-v0.1) with learning_rate=8e-6

	Result seems pretty good even with half epoch and low learning rate, the effect is smoother and less pronounced.

	Outputs are more compliant and verbose, less sloppy and safety aligned.

	------------------------------------------------------------------------------

	The first expert is Instruct 405B distillation/RP vector merge <b>(Supernova-Lite, Niitama1.1, Storm)</b>

	The second expert is ERP/Reddit data merge <b>(Celeste1.5, Stheno3.4, Storm)</b>

	-------------------------------------------------------------------------------

	The base model is <b>Sao10k/L3.1-Stheno-3.4</b> with the <b>Sunfall LoRa 0.6.1</b> to make it understand SillyTavern prompts and storywriting better.

	-------------------------------------------------------------------------------

	Resultant merge finetuned on [jondurbin/gutenberg-dpo-v0.1](https://huggingface.co/datasets/jondurbin/gutenberg-dpo-v0.1).

	# Prompt Template:
	```bash
	<\|begin_of_text\|><\|start_header_id\|>system<\|end_header_id\|>

	{system_prompt}<\|eot_id\|><\|start_header_id\|>user<\|end_header_id\|>

	{input}<\|eot_id\|><\|start_header_id\|>assistant<\|end_header_id\|>

	{output}<\|eot_id\|>

	```

	Sometimes has false refusals but swiping and "uncensored" prompts work. I have no idea why this happens tbh, since none of the base models exhibit this behavior at all, probably because of the custom Mixtral architecture's attention. But it's still pretty good imo.

	For Llama.cpp/LMStudio/etc Make sure "num_experts_used = 2"