File size: 1,526 Bytes

---
license: llama2
datasets:
- argilla/dpo-mix-7k
language:
- en
pipeline_tag: text-generation
---

# Model Card for Model ID

<!-- Provide a quick summary of what the model is/does. -->

<!--This modelcard aims to be a base template for new models. It has been generated using [this raw template](https://github.com/huggingface/huggingface_hub/blob/main/src/huggingface_hub/templates/modelcard_template.md?plain=1). -->

## Model Details

### Model Description

This model is DPO by `argilla/dpo-mix-7k` dataset on `rungao2001/vicuna-7b-v1.5_deita10k_sft_full` model.

- **Model type:** Llama2 Decoder-Only
- **Language(s) (NLP):** English
- **License:** llama2
- **Finetuned from model:** rungao2001/vicuna-7b-v1.5_deita10k_sft_full

## Training Details

### Training Data

argilla/dpo-mix-7k

### Training Procedure

DPO

Notice: The chat_template was modified because the original vicuna1.1 format cannot be used in trl.DPOTrainer. The error \"Conversation roles must alternate user/assistant/user/assistant/...\" was removed, and the system message is output only when loop.index0 == 0 and role == 'user'.

#### Training Hyperparameters

- **Precision:** BFloat16
- **Chat Template:** Modified Vicuna 1.1
- **Global Batch Size:** 128
- **Learning Rate:** 1.0e-6
- **Num Epoches:** 3
- **Max Prompt Length:** 1800
- **Max Length:** 2048
- **Training Steps** 156

## Evaluation

It Finally achieved loss=0.5006, and `rewards/accuracies` = 78.72% in the eval set of `argilla/dpo-mix-7k`

### Testing Data, Factors & Metrics