--- datasets: - jondurbin/gutenberg-dpo-v0.1 - Qwen/Qwen2.5-14B-Instruct - HuggingFaceH4/ultrafeedback_binarized base_model: - Qwen/Qwen2.5-14B-Instruct - v000000/Qwen2.5-14B-Gutenberg-1e-Delta - tanliboy/lambda-qwen2.5-14b-dpo-test library_name: transformers tags: - qwen - qwen2.5 - finetune - dpo - orpo - qwen2 - chat - conversational - instruct - storywriting - roleplay license: apache-2.0 language: - en pipeline_tag: text-generation --- # Qwen2.5-Lumen-14B * *Qwen direct preference optimization finetuned for ~3 epochs.* ![image/png](https://cdn-uploads.huggingface.co/production/uploads/64f74b6e6389380c77562762/wCcJkdrVDUH6m0AN9Lv3B.png) A qwen2.5 preference finetune, targeting prompt adherence, storywriting and roleplay. ------------------------------------------------------------------------------- ## Training Notes Trained [Qwen2.5-14B-Instruct](https://huggingface.co/Qwen/Qwen2.5-14B-Instruct) for 2 epochs on NVidia A100, and on dataset [jondurbin/gutenberg-dpo-v0.1](https://huggingface.co/datasets/jondurbin/gutenberg-dpo-v0.1), saving different checkpoints along the way (completely different runs at varying epochs and learning rates). [Tanliboy](https://huggingface.co/tanliboy) trained [Qwen2.5-14B-Instruct](https://huggingface.co/Qwen/Qwen2.5-14B-Instruct) for 1 epoch on [HuggingFaceH4/ultrafeedback_binarized](HuggingFaceH4/ultrafeedback_binarized), (Credit to Tanliboy! *Check out the model [here](https://huggingface.co/tanliboy/lambda-qwen2.5-14b-dpo-test)*) *Mass checkpoint merged, Based on Qwen2.5-14B-Instruct (Base Model).* ## Merge * Merged with a sophosympatheia's SLERP gradient *"Ultrafeedback-Binarized DPO"* and *"Gutenberg DPO"* * Merged with a sophosympatheia's SLERP gradient *"Qwen2.5-14B-Instruct"* and *"Gutenberg DPO"* * Merged all DPO checkpoints and SLERP variations with MODEL_STOCK to analyze geometric properties and get the most *performant* aspects of all runs/merges. *Model Stock* was chosen due to the similarity between the merged models. ## Recipe ```yaml models: - model: v000000/Qwen2.5-14B-Gutenberg-1e-Delta - model: v000000/Qwen2.5-14B-Gutenberg-0.6e-Sequential - model: v000000/Qwen2.5-14B-Gutenberg-0.25e-Early - model: v000000/Qwen2.5-14B-Gutenberg-2e-Sequential - model: v000000/Qwen2.5-14B-Gutenberg-0.37e-Early - model: v000000/Qwen2.5-14B-Gutenberg-2e-Zeta - model: v000000/Qwen2.5-14B-Gutenberg-1e-Theta - model: tanliboy/lambda-qwen2.5-14b-dpo-test - model: v000000/Qwen2.5-14B-Gutenberg-1e-Delta - model: tanliboy/lambda-qwen2.5-14b-dpo-test - model: v000000/Qwen2.5-14B-Gutenberg-UltraLambda-Slerpeno - model: v000000/Qwen2.5-14B-Gutenberg-Instruct-Slerpeno base_model: v000000/Qwen2.5-14B-Gutenberg-1e-Delta merge_method: model_stock dtype: bfloat16 ``` ### Finetune and merge This is a merge and finetune of pre-trained language models. ### Models Merged [Arxiv 2403.19522](https://arxiv.org/abs/2403.19522) The following models were included in the merge: * v000000/Qwen2.5-14B-Gutenberg-1e-Delta * v000000/Qwen2.5-14B-Gutenberg-0.6e-Sequential * v000000/Qwen2.5-14B-Gutenberg-0.25e-Early * v000000/Qwen2.5-14B-Gutenberg-2e-Sequential * v000000/Qwen2.5-14B-Gutenberg-0.37e-Early * v000000/Qwen2.5-14B-Gutenberg-2e-Zeta * v000000/Qwen2.5-14B-Gutenberg-1e-Theta * v000000/Qwen2.5-14B-Gutenberg-UltraLambda-Slerpeno * v000000/Qwen2.5-14B-Gutenberg-Instruct-Slerpeno * tanliboy/lambda-qwen2.5-14b-dpo-test ------------------------------------------------------------------------------- - Context Length: Full 131,072 tokens and generation 8192 tokens - Qwen2(ChatML) Prompt format