Edit model card

gpt_train_12_512

This model is a fine-tuned version of openai-community/gpt2 on the gokuls/wiki_book_corpus_raw_dataset_tiny dataset. It achieves the following results on the evaluation set:

  • Loss: 8.9141
  • Accuracy: 0.0917

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 1e-05
  • train_batch_size: 24
  • eval_batch_size: 24
  • seed: 10
  • distributed_type: multi-GPU
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: linear
  • num_epochs: 100
  • mixed_precision_training: Native AMP

Training results

Training Loss Epoch Step Validation Loss Accuracy
10.8828 0.0000 1 10.8828 0.0001
10.8984 0.0001 2 10.8828 0.0001
10.8906 0.0001 3 10.8828 0.0001
10.8828 0.0001 4 10.8828 0.0001
10.8828 0.0002 5 10.8828 0.0001
10.8828 0.0002 6 10.8828 0.0001
10.8906 0.0003 7 10.8828 0.0001
10.8828 0.0003 8 10.8828 0.0001
10.875 0.0003 9 10.8828 0.0001
10.8984 0.0004 10 10.8828 0.0001
10.8828 0.0004 11 10.8828 0.0001
10.8906 0.0004 12 10.8828 0.0001
10.8828 0.0005 13 10.8828 0.0001
10.8828 0.0005 14 10.8828 0.0001
10.8828 0.0005 15 10.8828 0.0001
10.8828 0.0006 16 10.8828 0.0001
10.875 0.0006 17 10.8828 0.0001
10.8828 0.0007 18 10.6328 0.0197
10.6641 0.0007 19 10.4844 0.0444
10.5078 0.0007 20 10.3828 0.0499
10.3984 0.0008 21 10.3125 0.0532
10.3438 0.0008 22 10.25 0.0550
10.2656 0.0008 23 10.2031 0.0562
10.25 0.0009 24 10.1641 0.0540
10.1875 0.0009 25 10.1328 0.0470
10.125 0.0009 26 10.1094 0.0461
10.125 0.0010 27 10.0859 0.0480
10.0938 0.0010 28 10.0703 0.0474
10.0625 0.0011 29 10.0547 0.0465
10.0703 0.0011 30 10.0391 0.0472
10.0156 0.0011 31 10.0234 0.0515
10.0859 0.0012 32 10.0156 0.0587
9.9922 0.0012 33 10.0078 0.0613
10.0234 0.0012 34 9.9922 0.0608
9.9609 0.0013 35 9.9844 0.0600
10.0391 0.0013 36 9.9766 0.0608
9.9922 0.0013 37 9.9609 0.0619
9.9688 0.0014 38 9.9531 0.0623
9.9453 0.0014 39 9.9375 0.0622
9.9609 0.0015 40 9.9297 0.0628
9.9609 0.0015 41 9.9141 0.0640
10.0234 0.0015 42 9.8984 0.0649
9.9375 0.0016 43 9.8906 0.0648
9.8516 0.0016 44 9.875 0.0644
9.8672 0.0016 45 9.8594 0.0643
9.8984 0.0017 46 9.8438 0.0643
9.875 0.0017 47 9.8359 0.0645
9.8672 0.0017 48 9.8203 0.0646
9.8984 0.0018 49 9.8125 0.0649
9.7891 0.0018 50 9.8047 0.0653
9.8281 0.0019 51 9.7891 0.0655
9.8281 0.0019 52 9.7812 0.0654
9.7969 0.0019 53 9.7734 0.0660
9.7812 0.0020 54 9.7656 0.0670
9.8047 0.0020 55 9.75 0.0682
9.7969 0.0020 56 9.7422 0.0688
9.7891 0.0021 57 9.7344 0.0691
9.6875 0.0021 58 9.7266 0.0690
9.7188 0.0021 59 9.7188 0.0686
9.7344 0.0022 60 9.7109 0.0682
9.7344 0.0022 61 9.6953 0.0687
9.7578 0.0023 62 9.6875 0.0697
9.6484 0.0023 63 9.6719 0.0708
9.6328 0.0023 64 9.6641 0.0715
9.7656 0.0024 65 9.6562 0.0721
9.6875 0.0024 66 9.6484 0.0725
9.6328 0.0024 67 9.6406 0.0727
9.6953 0.0025 68 9.6328 0.0734
9.7188 0.0025 69 9.625 0.0744
9.6875 0.0025 70 9.6172 0.0753
9.625 0.0026 71 9.6094 0.0763
9.6172 0.0026 72 9.6016 0.0769
9.6016 0.0027 73 9.5938 0.0771
9.6094 0.0027 74 9.5859 0.0771
9.5859 0.0027 75 9.5781 0.0771
9.5859 0.0028 76 9.5703 0.0767
9.5859 0.0028 77 9.5625 0.0765
9.5781 0.0028 78 9.5547 0.0764
9.6172 0.0029 79 9.5469 0.0763
9.5859 0.0029 80 9.5391 0.0768
9.5859 0.0029 81 9.5312 0.0770
9.5391 0.0030 82 9.5234 0.0770
9.5391 0.0030 83 9.5234 0.0764
9.5312 0.0031 84 9.5156 0.0758
9.5547 0.0031 85 9.5078 0.0757
9.5781 0.0031 86 9.5 0.0760
9.5703 0.0032 87 9.4922 0.0764
9.4844 0.0032 88 9.4844 0.0764
9.5312 0.0032 89 9.4766 0.0765
9.5312 0.0033 90 9.4688 0.0765
9.5078 0.0033 91 9.4688 0.0766
9.5 0.0033 92 9.4609 0.0768
9.4844 0.0034 93 9.4531 0.0769
9.4688 0.0034 94 9.4453 0.0773
9.5156 0.0035 95 9.4375 0.0777
9.4453 0.0035 96 9.4297 0.0783
9.4766 0.0035 97 9.4219 0.0794
9.4219 0.0036 98 9.4219 0.0804
9.4531 0.0036 99 9.4141 0.0814
9.4141 0.0036 100 9.4062 0.0819
9.375 0.0037 101 9.3984 0.0825
9.4219 0.0037 102 9.3906 0.0828
9.3828 0.0037 103 9.3828 0.0828
9.375 0.0038 104 9.3828 0.0827
9.3516 0.0038 105 9.375 0.0825
9.3906 0.0039 106 9.3672 0.0825
9.3672 0.0039 107 9.3594 0.0823
9.3359 0.0039 108 9.3516 0.0822
9.4062 0.0040 109 9.3438 0.0818
9.3906 0.0040 110 9.3438 0.0816
9.25 0.0040 111 9.3359 0.0816
9.3281 0.0041 112 9.3281 0.0816
9.375 0.0041 113 9.3203 0.0813
9.3906 0.0041 114 9.3203 0.0812
9.3203 0.0042 115 9.3125 0.0812
9.3125 0.0042 116 9.3047 0.0811
9.3359 0.0043 117 9.2969 0.0809
9.2812 0.0043 118 9.2969 0.0808
9.2031 0.0043 119 9.2891 0.0807
9.2422 0.0044 120 9.2812 0.0808
9.3047 0.0044 121 9.2812 0.0809
9.2969 0.0044 122 9.2734 0.0810
9.25 0.0045 123 9.2656 0.0815
9.3281 0.0045 124 9.2578 0.0825
9.2656 0.0045 125 9.2578 0.0836
9.3047 0.0046 126 9.25 0.0845
9.25 0.0046 127 9.2422 0.0850
9.2969 0.0046 128 9.2344 0.0852
9.3203 0.0047 129 9.2344 0.0853
9.25 0.0047 130 9.2266 0.0853
9.2422 0.0048 131 9.2188 0.0854
9.1641 0.0048 132 9.2109 0.0855
9.2109 0.0048 133 9.2109 0.0858
9.2422 0.0049 134 9.2031 0.0860
9.2188 0.0049 135 9.1953 0.0861
9.3047 0.0049 136 9.1875 0.0861
9.1641 0.0050 137 9.1875 0.0861
9.2188 0.0050 138 9.1797 0.0859
9.2422 0.0050 139 9.1719 0.0856
9.2422 0.0051 140 9.1719 0.0855
9.1484 0.0051 141 9.1641 0.0852
9.2422 0.0052 142 9.1562 0.0851
9.1953 0.0052 143 9.1484 0.0852
9.1641 0.0052 144 9.1484 0.0853
9.1875 0.0053 145 9.1406 0.0854
9.1172 0.0053 146 9.1328 0.0855
9.1094 0.0053 147 9.1328 0.0856
9.1328 0.0054 148 9.125 0.0859
9.1641 0.0054 149 9.1172 0.0863
9.1641 0.0054 150 9.1094 0.0868
9.1875 0.0055 151 9.1094 0.0873
9.2031 0.0055 152 9.1016 0.0875
9.0703 0.0056 153 9.0938 0.0880
9.1484 0.0056 154 9.0859 0.0884
9.0625 0.0056 155 9.0859 0.0888
9.0781 0.0057 156 9.0781 0.0889
9.0234 0.0057 157 9.0703 0.0892
9.0781 0.0057 158 9.0703 0.0894
9.0 0.0058 159 9.0625 0.0895
9.0312 0.0058 160 9.0547 0.0896
9.0391 0.0058 161 9.0547 0.0898
9.0469 0.0059 162 9.0469 0.0901
9.0859 0.0059 163 9.0391 0.0905
9.0078 0.0060 164 9.0312 0.0908
9.0156 0.0060 165 9.0312 0.0909
9.0469 0.0060 166 9.0234 0.0909
8.9219 0.0061 167 9.0234 0.0908
9.0312 0.0061 168 9.0156 0.0907
9.0938 0.0061 169 9.0078 0.0906
9.0156 0.0062 170 9.0 0.0902
9.0312 0.0062 171 9.0 0.0897
9.0625 0.0062 172 8.9922 0.0893
8.9844 0.0063 173 8.9844 0.0891
9.0703 0.0063 174 8.9844 0.0894
8.9609 0.0064 175 8.9766 0.0898
8.9922 0.0064 176 8.9766 0.0905
9.0234 0.0064 177 8.9688 0.0910
9.0234 0.0065 178 8.9609 0.0915
8.9219 0.0065 179 8.9531 0.0919
9.0234 0.0065 180 8.9531 0.0920
8.9375 0.0066 181 8.9453 0.0921
8.9688 0.0066 182 8.9375 0.0919
8.9375 0.0066 183 8.9375 0.0913
9.0 0.0067 184 8.9297 0.0912
8.9375 0.0067 185 8.9219 0.0913
8.9609 0.0068 186 8.9219 0.0913
8.9688 0.0068 187 8.9141 0.0917

Framework versions

  • Transformers 4.41.2
  • Pytorch 2.1.0a0+32f93b1
  • Datasets 2.20.0
  • Tokenizers 0.19.1
Downloads last month
4
Safetensors
Model size
89.8M params
Tensor type
FP16
·
Inference API
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Model tree for gokulsrinivasagan/gpt_train_12_512

Finetuned
this model

Dataset used to train gokulsrinivasagan/gpt_train_12_512

Evaluation results

  • Accuracy on gokuls/wiki_book_corpus_raw_dataset_tiny
    self-reported
    0.092