rungao2001
commited on
Commit
•
c117982
1
Parent(s):
f0beda9
Update README.md
Browse files
README.md
CHANGED
@@ -5,4 +5,51 @@ datasets:
|
|
5 |
language:
|
6 |
- en
|
7 |
pipeline_tag: text-generation
|
8 |
-
---
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
5 |
language:
|
6 |
- en
|
7 |
pipeline_tag: text-generation
|
8 |
+
---
|
9 |
+
|
10 |
+
# Model Card for Model ID
|
11 |
+
|
12 |
+
<!-- Provide a quick summary of what the model is/does. -->
|
13 |
+
|
14 |
+
<!--This modelcard aims to be a base template for new models. It has been generated using [this raw template](https://github.com/huggingface/huggingface_hub/blob/main/src/huggingface_hub/templates/modelcard_template.md?plain=1). -->
|
15 |
+
|
16 |
+
## Model Details
|
17 |
+
|
18 |
+
### Model Description
|
19 |
+
|
20 |
+
This model is DPO by `argilla/dpo-mix-7k` dataset on `rungao2001/vicuna-7b-v1.5_deita10k_sft_full` model.
|
21 |
+
|
22 |
+
- **Model type:** Llama2 Decoder-Only
|
23 |
+
- **Language(s) (NLP):** English
|
24 |
+
- **License:** llama2
|
25 |
+
- **Finetuned from model:** rungao2001/vicuna-7b-v1.5_deita10k_sft_full
|
26 |
+
|
27 |
+
## Training Details
|
28 |
+
|
29 |
+
### Training Data
|
30 |
+
|
31 |
+
argilla/dpo-mix-7k
|
32 |
+
|
33 |
+
### Training Procedure
|
34 |
+
|
35 |
+
DPO
|
36 |
+
|
37 |
+
Notice: `do_sample` in `generation_config.json` was set to `True` to avoid this error `https://github.com/huggingface/transformers/issues/29988`.
|
38 |
+
Notice: `chat_template` was modified because the original vicuna1.1 format can not be used in trl.DPOTrainer. The \"Conversation roles must alternate user/assistant/user/assistant/...\" Error was removed, and system message is output only when loop.index0 == 0 and role == \'user\'.
|
39 |
+
|
40 |
+
#### Training Hyperparameters
|
41 |
+
|
42 |
+
- **Precision:** BFloat16
|
43 |
+
- **Chat Template:** Modified Vicuna 1.1
|
44 |
+
- **Global Batch Size:** 128
|
45 |
+
- **Learning Rate:** 1.0e-6
|
46 |
+
- **Num Epoches:** 3
|
47 |
+
- **Max Prompt Length:** 1800
|
48 |
+
- **Max Length:** 2048
|
49 |
+
- **Training Steps** 156
|
50 |
+
|
51 |
+
## Evaluation
|
52 |
+
|
53 |
+
It Finally achieved loss=0.5006, and `rewards/accuracies` = 78.72% in the eval set of `HuggingFaceH4/deita-10k-v0-sft`
|
54 |
+
|
55 |
+
### Testing Data, Factors & Metrics
|