File size: 1,526 Bytes
f0beda9
 
 
 
 
 
 
c117982
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
e2f4984
c117982
 
 
 
 
 
 
 
 
 
 
 
 
 
68bde84
c117982
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
---
license: llama2
datasets:
- argilla/dpo-mix-7k
language:
- en
pipeline_tag: text-generation
---

# Model Card for Model ID

<!-- Provide a quick summary of what the model is/does. -->

<!--This modelcard aims to be a base template for new models. It has been generated using [this raw template](https://github.com/huggingface/huggingface_hub/blob/main/src/huggingface_hub/templates/modelcard_template.md?plain=1). -->

## Model Details

### Model Description

This model is DPO by `argilla/dpo-mix-7k` dataset on `rungao2001/vicuna-7b-v1.5_deita10k_sft_full` model.

- **Model type:** Llama2 Decoder-Only
- **Language(s) (NLP):** English
- **License:** llama2
- **Finetuned from model:** rungao2001/vicuna-7b-v1.5_deita10k_sft_full

## Training Details

### Training Data

argilla/dpo-mix-7k

### Training Procedure

DPO

Notice: The chat_template was modified because the original vicuna1.1 format cannot be used in trl.DPOTrainer. The error \"Conversation roles must alternate user/assistant/user/assistant/...\" was removed, and the system message is output only when loop.index0 == 0 and role == 'user'.

#### Training Hyperparameters

- **Precision:** BFloat16
- **Chat Template:** Modified Vicuna 1.1
- **Global Batch Size:** 128
- **Learning Rate:** 1.0e-6
- **Num Epoches:** 3
- **Max Prompt Length:** 1800
- **Max Length:** 2048
- **Training Steps** 156

## Evaluation

It Finally achieved loss=0.5006, and `rewards/accuracies` = 78.72% in the eval set of `argilla/dpo-mix-7k`

### Testing Data, Factors & Metrics