rungao2001 commited on
Commit
c117982
1 Parent(s): f0beda9

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +48 -1
README.md CHANGED
@@ -5,4 +5,51 @@ datasets:
5
  language:
6
  - en
7
  pipeline_tag: text-generation
8
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
5
  language:
6
  - en
7
  pipeline_tag: text-generation
8
+ ---
9
+
10
+ # Model Card for Model ID
11
+
12
+ <!-- Provide a quick summary of what the model is/does. -->
13
+
14
+ <!--This modelcard aims to be a base template for new models. It has been generated using [this raw template](https://github.com/huggingface/huggingface_hub/blob/main/src/huggingface_hub/templates/modelcard_template.md?plain=1). -->
15
+
16
+ ## Model Details
17
+
18
+ ### Model Description
19
+
20
+ This model is DPO by `argilla/dpo-mix-7k` dataset on `rungao2001/vicuna-7b-v1.5_deita10k_sft_full` model.
21
+
22
+ - **Model type:** Llama2 Decoder-Only
23
+ - **Language(s) (NLP):** English
24
+ - **License:** llama2
25
+ - **Finetuned from model:** rungao2001/vicuna-7b-v1.5_deita10k_sft_full
26
+
27
+ ## Training Details
28
+
29
+ ### Training Data
30
+
31
+ argilla/dpo-mix-7k
32
+
33
+ ### Training Procedure
34
+
35
+ DPO
36
+
37
+ Notice: `do_sample` in `generation_config.json` was set to `True` to avoid this error `https://github.com/huggingface/transformers/issues/29988`.
38
+ Notice: `chat_template` was modified because the original vicuna1.1 format can not be used in trl.DPOTrainer. The \"Conversation roles must alternate user/assistant/user/assistant/...\" Error was removed, and system message is output only when loop.index0 == 0 and role == \'user\'.
39
+
40
+ #### Training Hyperparameters
41
+
42
+ - **Precision:** BFloat16
43
+ - **Chat Template:** Modified Vicuna 1.1
44
+ - **Global Batch Size:** 128
45
+ - **Learning Rate:** 1.0e-6
46
+ - **Num Epoches:** 3
47
+ - **Max Prompt Length:** 1800
48
+ - **Max Length:** 2048
49
+ - **Training Steps** 156
50
+
51
+ ## Evaluation
52
+
53
+ It Finally achieved loss=0.5006, and `rewards/accuracies` = 78.72% in the eval set of `HuggingFaceH4/deita-10k-v0-sft`
54
+
55
+ ### Testing Data, Factors & Metrics