--- license: llama2 datasets: - argilla/dpo-mix-7k language: - en pipeline_tag: text-generation --- # Model Card for Model ID ## Model Details ### Model Description This model is DPO by `argilla/dpo-mix-7k` dataset on `rungao2001/vicuna-7b-v1.5_deita10k_sft_full` model. - **Model type:** Llama2 Decoder-Only - **Language(s) (NLP):** English - **License:** llama2 - **Finetuned from model:** rungao2001/vicuna-7b-v1.5_deita10k_sft_full ## Training Details ### Training Data argilla/dpo-mix-7k ### Training Procedure DPO Notice: `do_sample` in `generation_config.json` was set to `True` to avoid this error `https://github.com/huggingface/transformers/issues/29988`. Notice: `chat_template` was modified because the original vicuna1.1 format can not be used in trl.DPOTrainer. The \"Conversation roles must alternate user/assistant/user/assistant/...\" Error was removed, and system message is output only when loop.index0 == 0 and role == \'user\'. #### Training Hyperparameters - **Precision:** BFloat16 - **Chat Template:** Modified Vicuna 1.1 - **Global Batch Size:** 128 - **Learning Rate:** 1.0e-6 - **Num Epoches:** 3 - **Max Prompt Length:** 1800 - **Max Length:** 2048 - **Training Steps** 156 ## Evaluation It Finally achieved loss=0.5006, and `rewards/accuracies` = 78.72% in the eval set of `HuggingFaceH4/deita-10k-v0-sft` ### Testing Data, Factors & Metrics