--- license: apache-2.0 datasets: - teknium/GPT4-LLM-Cleaned --- ## Model description More information needed ## Intended uses & limitations More information needed ## Training and evaluation data More information needed ## Training procedure ### Training hyperparameters The following hyperparameters were used during training: - learning_rate: 5e-06 - train_batch_size: 4 - eval_batch_size: 4 - seed: 8 - distributed_type: multi-GPU - num_devices: 4 - total_train_batch_size: 16 - total_eval_batch_size: 16 - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08 - lr_scheduler_type: cosine - lr_scheduler_warmup_steps: 50 - num_epochs: 5 ### Training results | Training Loss | Epoch | Step | Validation Loss | |:-------------:|:-----:|:----:|:---------------:| | 11.2812 | 0.0 | 1 | 11.5156 | | 5.0938 | 0.2 | 62 | 5.1016 | | 3.5703 | 0.4 | 124 | 3.7161 | | 2.582 | 0.6 | 186 | 2.9010 | | 2.2109 | 0.8 | 248 | 2.5156 | | 1.9824 | 1.0 | 310 | 2.3477 | | 1.8594 | 1.18 | 372 | 2.1960 | | 1.748 | 1.38 | 434 | 2.1667 | | 1.748 | 1.58 | 496 | 2.0195 | | 1.7617 | 1.78 | 558 | 2.0749 | | 1.6582 | 1.98 | 620 | 1.9095 | | 1.5762 | 2.16 | 682 | 1.9036 | | 1.5586 | 2.36 | 744 | 1.8457 | | 1.6016 | 2.56 | 806 | 1.8112 | | 1.5195 | 2.76 | 868 | 1.8034 | | 1.5645 | 2.96 | 930 | 1.7773 | | 1.457 | 3.14 | 992 | 1.7474 | | 1.4883 | 3.34 | 1054 | 1.7467 | | 1.4648 | 3.54 | 1116 | 1.7676 | | 1.5195 | 3.74 | 1178 | 1.7383 | | 1.4531 | 3.94 | 1240 | 1.7383 | | 1.4648 | 4.12 | 1302 | 1.7181 | | 1.4121 | 4.32 | 1364 | 1.7272 | | 1.4727 | 4.52 | 1426 | 1.7259 | | 1.4219 | 4.72 | 1488 | 1.7240 | | 1.5137 | 4.92 | 1550 | 1.7227 | ### Framework versions - Transformers 4.37.0.dev0 - Pytorch 2.1.2+cu121 - Datasets 2.15.0 - Tokenizers 0.15.0 [Built with Axolotl](https://github.com/OpenAccess-AI-Collective/axolotl)