sumuks commited on
Commit
70d6617
1 Parent(s): 46660b2

End of training

Browse files
Files changed (1) hide show
  1. README.md +164 -0
README.md ADDED
@@ -0,0 +1,164 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ base_model: microsoft/Phi-3-mini-4k-instruct
3
+ library_name: peft
4
+ license: mit
5
+ tags:
6
+ - axolotl
7
+ - generated_from_trainer
8
+ model-index:
9
+ - name: phi3-nosys-gpt4ominiplans-27k-512rank-long
10
+ results: []
11
+ ---
12
+
13
+ <!-- This model card has been generated automatically according to the information the Trainer had access to. You
14
+ should probably proofread and complete it, then remove this comment. -->
15
+
16
+ [<img src="https://raw.githubusercontent.com/axolotl-ai-cloud/axolotl/main/image/axolotl-badge-web.png" alt="Built with Axolotl" width="200" height="32"/>](https://github.com/axolotl-ai-cloud/axolotl)
17
+ <details><summary>See axolotl config</summary>
18
+
19
+ axolotl version: `0.4.1`
20
+ ```yaml
21
+ # model and tokenizer
22
+ base_model: microsoft/Phi-3-mini-4k-instruct # change for model
23
+ trust_remote_code: true
24
+ sequence_len: 2048
25
+
26
+ strict: false
27
+
28
+ model_type: AutoModelForCausalLM
29
+ tokenizer_type: AutoTokenizer
30
+ bf16: auto
31
+ pad_to_sequence_len: true
32
+ save_safetensors: true
33
+
34
+
35
+ datasets:
36
+ - path: verifiers-for-code/sampled_10k_from_27k
37
+ type: completion
38
+ field: text_nosys_phi
39
+ train_on_split: train
40
+
41
+ val_set_size: 0.05
42
+
43
+ # lora
44
+ adapter: lora
45
+ lora_r: 512
46
+ lora_alpha: 32
47
+ lora_dropout: 0.05
48
+ lora_target_linear: true
49
+ lora_modules_to_save:
50
+ - embed_tokens
51
+ - lm_head
52
+ use_rslora: true
53
+
54
+ # logging
55
+ wandb_project: valeris
56
+ wandb_name: phi3-nosys-gpt4ominiplans-27k-512rank-long
57
+
58
+ output_dir: ./outputs/phi3-nosys-gpt4ominiplans-27k-512rank-long
59
+
60
+ gradient_accumulation_steps: 2
61
+ gradient_checkpointing: true
62
+ gradient_checkpointing_kwargs:
63
+ use_reentrant: true
64
+ micro_batch_size: 2
65
+ num_epochs: 3
66
+ eval_batch_size: 2
67
+ warmup_ratio: 0.05
68
+ learning_rate: 1e-5
69
+ lr_scheduler: cosine
70
+ optimizer: adamw_torch
71
+
72
+ hub_model_id: verifiers-for-code/phi3-nosys-gpt4ominiplans-27k-512rank-long
73
+ push_to_hub: true
74
+ hub_strategy: all_checkpoints
75
+ hub_always_push: true
76
+ evals_per_epoch: 8
77
+ saves_per_epoch: 4
78
+ logging_steps: 1
79
+ # eval_table_size: 10
80
+ # eval_max_new_tokens: 512
81
+
82
+ tokens: ["<thinking>", "</thinking>", "<plan>", "</plan>"]
83
+
84
+ special_tokens:
85
+ pad_token: "<|endoftext|>"
86
+
87
+ ```
88
+
89
+ </details><br>
90
+
91
+ # phi3-nosys-gpt4ominiplans-27k-512rank-long
92
+
93
+ This model is a fine-tuned version of [microsoft/Phi-3-mini-4k-instruct](https://huggingface.co/microsoft/Phi-3-mini-4k-instruct) on the None dataset.
94
+ It achieves the following results on the evaluation set:
95
+ - Loss: 0.6378
96
+
97
+ ## Model description
98
+
99
+ More information needed
100
+
101
+ ## Intended uses & limitations
102
+
103
+ More information needed
104
+
105
+ ## Training and evaluation data
106
+
107
+ More information needed
108
+
109
+ ## Training procedure
110
+
111
+ ### Training hyperparameters
112
+
113
+ The following hyperparameters were used during training:
114
+ - learning_rate: 1e-05
115
+ - train_batch_size: 2
116
+ - eval_batch_size: 2
117
+ - seed: 42
118
+ - distributed_type: multi-GPU
119
+ - num_devices: 8
120
+ - gradient_accumulation_steps: 2
121
+ - total_train_batch_size: 32
122
+ - total_eval_batch_size: 16
123
+ - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
124
+ - lr_scheduler_type: cosine
125
+ - lr_scheduler_warmup_steps: 44
126
+ - num_epochs: 3
127
+
128
+ ### Training results
129
+
130
+ | Training Loss | Epoch | Step | Validation Loss |
131
+ |:-------------:|:------:|:----:|:---------------:|
132
+ | 1.0833 | 0.0034 | 1 | 1.0330 |
133
+ | 1.0093 | 0.1279 | 38 | 0.9910 |
134
+ | 0.9169 | 0.2559 | 76 | 0.8668 |
135
+ | 0.795 | 0.3838 | 114 | 0.7676 |
136
+ | 0.6999 | 0.5118 | 152 | 0.7243 |
137
+ | 0.7246 | 0.6397 | 190 | 0.6989 |
138
+ | 0.6873 | 0.7677 | 228 | 0.6816 |
139
+ | 0.7014 | 0.8956 | 266 | 0.6687 |
140
+ | 0.6586 | 1.0236 | 304 | 0.6585 |
141
+ | 0.6532 | 1.1515 | 342 | 0.6511 |
142
+ | 0.6334 | 1.2795 | 380 | 0.6463 |
143
+ | 0.5968 | 1.4074 | 418 | 0.6434 |
144
+ | 0.6366 | 1.5354 | 456 | 0.6414 |
145
+ | 0.6126 | 1.6633 | 494 | 0.6400 |
146
+ | 0.6564 | 1.7912 | 532 | 0.6391 |
147
+ | 0.6296 | 1.9192 | 570 | 0.6387 |
148
+ | 0.6225 | 2.0471 | 608 | 0.6383 |
149
+ | 0.6354 | 2.1751 | 646 | 0.6381 |
150
+ | 0.6111 | 2.3030 | 684 | 0.6379 |
151
+ | 0.5899 | 2.4310 | 722 | 0.6378 |
152
+ | 0.6415 | 2.5589 | 760 | 0.6378 |
153
+ | 0.6443 | 2.6869 | 798 | 0.6377 |
154
+ | 0.6103 | 2.8148 | 836 | 0.6377 |
155
+ | 0.6451 | 2.9428 | 874 | 0.6378 |
156
+
157
+
158
+ ### Framework versions
159
+
160
+ - PEFT 0.11.1
161
+ - Transformers 4.44.0.dev0
162
+ - Pytorch 2.4.0
163
+ - Datasets 2.19.1
164
+ - Tokenizers 0.19.1