File size: 1,526 Bytes
f0beda9 c117982 e2f4984 c117982 68bde84 c117982 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 |
---
license: llama2
datasets:
- argilla/dpo-mix-7k
language:
- en
pipeline_tag: text-generation
---
# Model Card for Model ID
<!-- Provide a quick summary of what the model is/does. -->
<!--This modelcard aims to be a base template for new models. It has been generated using [this raw template](https://github.com/huggingface/huggingface_hub/blob/main/src/huggingface_hub/templates/modelcard_template.md?plain=1). -->
## Model Details
### Model Description
This model is DPO by `argilla/dpo-mix-7k` dataset on `rungao2001/vicuna-7b-v1.5_deita10k_sft_full` model.
- **Model type:** Llama2 Decoder-Only
- **Language(s) (NLP):** English
- **License:** llama2
- **Finetuned from model:** rungao2001/vicuna-7b-v1.5_deita10k_sft_full
## Training Details
### Training Data
argilla/dpo-mix-7k
### Training Procedure
DPO
Notice: The chat_template was modified because the original vicuna1.1 format cannot be used in trl.DPOTrainer. The error \"Conversation roles must alternate user/assistant/user/assistant/...\" was removed, and the system message is output only when loop.index0 == 0 and role == 'user'.
#### Training Hyperparameters
- **Precision:** BFloat16
- **Chat Template:** Modified Vicuna 1.1
- **Global Batch Size:** 128
- **Learning Rate:** 1.0e-6
- **Num Epoches:** 3
- **Max Prompt Length:** 1800
- **Max Length:** 2048
- **Training Steps** 156
## Evaluation
It Finally achieved loss=0.5006, and `rewards/accuracies` = 78.72% in the eval set of `argilla/dpo-mix-7k`
### Testing Data, Factors & Metrics |