L3.1-Celestial-Stone-2x8B-DPO / README.md

v000000

Update README.md

066cafd verified 3 days ago

preview code

raw

history blame

No virus

3.97 kB

	---
	base_model:
	- nothingiisreal/L3.1-8B-Celeste-V1.5
	- Sao10K/Llama-3.1-8B-Stheno-v3.4
	- Sao10K/L3.1-8B-Niitama-v1.1
	- arcee-ai/Llama-3.1-SuperNova-Lite
	- akjindal53244/Llama-3.1-Storm-8B
	- arcee-ai/Llama-Spark
	- grimjim/Llama-3-Instruct-abliteration-LoRA-8B
	- crestf411/sunfall-peft
	- v000000/L3.1-Celestial-Stone-2x8B
	library_name: transformers
	tags:
	- merge
	- llama
	- mixtral
	- dpo
	datasets:
	- jondurbin/gutenberg-dpo-v0.1
	---

	> [!WARNING]
	> Sampler:<br>
	> Likes a low temperature due to the MoE architecture. I use 0.3 personally.

	# Llama-3.1-Celestial-Stone-2x8B-DPO (BF16)

	* DPO Trained, Mixture of Experts (14B).

	![image/png](https://cdn-uploads.huggingface.co/production/uploads/64f74b6e6389380c77562762/lyRa7z5maTqAaa43sxC2J.png)

	* <b>2x Experts working together per token, Gutenberg novelwriting finetuning.</b>

	------------------------------------------------------------------------------

	The first expert is Instruct 405B distillation/RP vector merge <b>(Supernova-Lite, Niitama1.1, Storm)</b>

	The second expert is ERP/Reddit data merge <b>(Celeste1.5, Stheno3.4, Storm)</b>

	-------------------------------------------------------------------------------

	The base model is <b>Sao10k/L3.1-Stheno-3.4</b> with the <b>Sunfall LoRa 0.6.1</b> to make it understand SillyTavern prompts and storywriting better.

	-------------------------------------------------------------------------------

	Resultant merge finetuned on [jondurbin/gutenberg-dpo-v0.1](https://huggingface.co/datasets/jondurbin/gutenberg-dpo-v0.1).

	-------------------------------------------------------------------------------

	List of llama.cpp repos

	# Thanks mradermacher (GGUF):

	* [GGUF static Q2-Q8](https://huggingface.co/mradermacher/L3.1-Celestial-Stone-2x8B-DPO-GGUF)
	* [GGUF Imatrix Q2-Q6](https://huggingface.co/mradermacher/L3.1-Celestial-Stone-2x8B-DPO-i1-GGUF)

	# Thanks QuantFactory (GGUF):

	* [GGUF static Q2-Q8](https://huggingface.co/QuantFactory/L3.1-Celestial-Stone-2x8B-DPO-GGUF)

	# Thanks Triangle104 (GGUF):

	* [Q8_0](https://huggingface.co/Triangle104/L3.1-Celestial-Stone-2x8B-DPO-Q8_0-GGUF)
	* [Q6_K](https://huggingface.co/Triangle104/L3.1-Celestial-Stone-2x8B-DPO-Q6_K-GGUF)
	* [Q5_K_M](https://huggingface.co/Triangle104/L3.1-Celestial-Stone-2x8B-DPO-Q5_K_M-GGUF)
	* [Q5_K_S](https://huggingface.co/Triangle104/L3.1-Celestial-Stone-2x8B-DPO-Q5_K_S-GGUF)
	* [Q4_K_M](https://huggingface.co/Triangle104/L3.1-Celestial-Stone-2x8B-DPO-Q4_K_M-GGUF)
	* [Q4_K_S](https://huggingface.co/Triangle104/L3.1-Celestial-Stone-2x8B-DPO-Q4_K_S-GGUF)

	Other

	* [GGUF Imatrix IQ4-Q8](https://huggingface.co/v000000/L3.1-Celestial-Stone-2x8B-DPO-GGUFs-IMATRIX)

	---------------------------------------------------------------------------------

	[L3.1-Celestial-Stone-2x8B](https://huggingface.co/v000000/L3.1-Celestial-Stone-2x8B) Finetuned on Nvidia A100. (See Base Model card for additional details.)

	0.5 Epoch completed of dataset [jondurbin/gutenberg-dpo-v0.1](https://huggingface.co/datasets/jondurbin/gutenberg-dpo-v0.1) with learning_rate=8e-6

	Result seems pretty good even with half epoch and low learning rate, the effect is smoother and less pronounced but its probably not optimal.

	Outputs are more compliant and verbose, less sloppy and safety aligned.

	------------------------------------------------------------------------------

	# Prompt Template:
	```bash
	<\|begin_of_text\|><\|start_header_id\|>system<\|end_header_id\|>

	{system_prompt}<\|eot_id\|><\|start_header_id\|>user<\|end_header_id\|>

	{input}<\|eot_id\|><\|start_header_id\|>assistant<\|end_header_id\|>

	{output}<\|eot_id\|>

	```

	Sometimes has false refusals but swiping and "uncensored" prompts work. I have no idea why this happens tbh, since none of the base models exhibit this behavior, it seems to be a random emergence, and extra abliteration has no impact? gating method has no impact.
	But it's still pretty good imo.

	For Llama.cpp/LMStudio/etc Make sure "num_experts_used = 2"