---
license: llama2
datasets:
- argilla/dpo-mix-7k
language:
- en
pipeline_tag: text-generation
---

# Model Card for Model ID

<!-- Provide a quick summary of what the model is/does. -->

<!--This modelcard aims to be a base template for new models. It has been generated using [this raw template](https://github.com/huggingface/huggingface_hub/blob/main/src/huggingface_hub/templates/modelcard_template.md?plain=1). -->

## Model Details

### Model Description

This model is DPO by `argilla/dpo-mix-7k` dataset on `rungao2001/vicuna-7b-v1.5_deita10k_sft_full` model.

- **Model type:** Llama2 Decoder-Only
- **Language(s) (NLP):** English
- **License:** llama2
- **Finetuned from model:** rungao2001/vicuna-7b-v1.5_deita10k_sft_full

## Training Details

### Training Data

argilla/dpo-mix-7k

### Training Procedure

DPO

Notice: `do_sample` in `generation_config.json` was set to `True` to avoid this error `https://github.com/huggingface/transformers/issues/29988`.
Notice: `chat_template` was modified because the original vicuna1.1 format can not be used in trl.DPOTrainer. The \"Conversation roles must alternate user/assistant/user/assistant/...\" Error was removed, and system message is output only when loop.index0 == 0 and role == \'user\'.

#### Training Hyperparameters

- **Precision:** BFloat16
- **Chat Template:** Modified Vicuna 1.1
- **Global Batch Size:** 128
- **Learning Rate:** 1.0e-6
- **Num Epoches:** 3
- **Max Prompt Length:** 1800
- **Max Length:** 2048
- **Training Steps** 156

## Evaluation

It Finally achieved loss=0.5006, and `rewards/accuracies` = 78.72% in the eval set of `HuggingFaceH4/deita-10k-v0-sft`

### Testing Data, Factors & Metrics