gpt2-wikitext2-LONG

This model is a fine-tuned version of gpt2 on an unknown dataset. It achieves the following results on the evaluation set:

Loss: 8.5448

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 2e-05
train_batch_size: 8
eval_batch_size: 8
seed: 42
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: linear
training_steps: 250000

Training results

Training Loss	Epoch	Step	Validation Loss
6.2474	1.0	2240	6.1890
5.8841	2.0	4480	5.8967
5.5679	3.0	6720	5.6904
5.3801	4.0	8960	5.5438
5.1759	5.0	11200	5.4250
5.0068	6.0	13440	5.3377
4.818	7.0	15680	5.2648
4.6836	8.0	17920	5.2054
4.5107	9.0	20160	5.1622
4.3756	10.0	22400	5.1392
4.2265	11.0	24640	5.1194
4.0932	12.0	26880	5.1065
3.9449	13.0	29120	5.1109
3.8233	14.0	31360	5.1250
3.6796	15.0	33600	5.1433
3.5556	16.0	35840	5.1692
3.4383	17.0	38080	5.2086
3.299	18.0	40320	5.2423
3.1903	19.0	42560	5.2913
3.0618	20.0	44800	5.3327
2.9429	21.0	47040	5.3867
2.8275	22.0	49280	5.4452
2.7206	23.0	51520	5.5040
2.6081	24.0	53760	5.5760
2.5133	25.0	56000	5.6352
2.3831	26.0	58240	5.6871
2.2795	27.0	60480	5.7515
2.2009	28.0	62720	5.8257
2.0864	29.0	64960	5.8798
2.0069	30.0	67200	5.9585
1.9058	31.0	69440	6.0158
1.8336	32.0	71680	6.0893
1.7406	33.0	73920	6.1480
1.6725	34.0	76160	6.2075
1.5814	35.0	78400	6.2683
1.5209	36.0	80640	6.3362
1.4352	37.0	82880	6.4068
1.3732	38.0	85120	6.4493
1.3004	39.0	87360	6.5188
1.2466	40.0	89600	6.5716
1.1749	41.0	91840	6.6248
1.1317	42.0	94080	6.6937
1.0588	43.0	96320	6.7596
1.0154	44.0	98560	6.8063
0.9544	45.0	100800	6.8594
0.918	46.0	103040	6.9139
0.8603	47.0	105280	6.9788
0.8228	48.0	107520	7.0178
0.7757	49.0	109760	7.0820
0.7445	50.0	112000	7.1300
0.6947	51.0	114240	7.1802
0.6559	52.0	116480	7.2233
0.6281	53.0	118720	7.2744
0.5912	54.0	120960	7.3109
0.5713	55.0	123200	7.3557
0.537	56.0	125440	7.3980
0.5166	57.0	127680	7.4294
0.4882	58.0	129920	7.4812
0.4662	59.0	132160	7.5245
0.4427	60.0	134400	7.5481
0.4272	61.0	136640	7.5961
0.4046	62.0	138880	7.6457
0.395	63.0	141120	7.6701
0.3717	64.0	143360	7.7151
0.359	65.0	145600	7.7493
0.3435	66.0	147840	7.7703
0.333	67.0	150080	7.8155
0.3163	68.0	152320	7.8550
0.3074	69.0	154560	7.8780
0.2945	70.0	156800	7.9197
0.2866	71.0	159040	7.9441
0.2733	72.0	161280	7.9762
0.2655	73.0	163520	7.9940
0.2559	74.0	165760	8.0210
0.2489	75.0	168000	8.0440
0.2399	76.0	170240	8.0695
0.229	77.0	172480	8.0998
0.2254	78.0	174720	8.1213
0.2159	79.0	176960	8.1404
0.2118	80.0	179200	8.1594
0.2042	81.0	181440	8.1839
0.199	82.0	183680	8.2196
0.1935	83.0	185920	8.2277
0.1882	84.0	188160	8.2494
0.1826	85.0	190400	8.2727
0.1793	86.0	192640	8.2852
0.1732	87.0	194880	8.3022
0.1703	88.0	197120	8.3139
0.1647	89.0	199360	8.3354
0.1625	90.0	201600	8.3469
0.1579	91.0	203840	8.3671
0.154	92.0	206080	8.3825
0.1506	93.0	208320	8.3879
0.147	94.0	210560	8.4059
0.143	95.0	212800	8.4183
0.1403	96.0	215040	8.4287
0.1371	97.0	217280	8.4522
0.1351	98.0	219520	8.4547
0.1306	99.0	221760	8.4614
0.1294	100.0	224000	8.4809
0.126	101.0	226240	8.4951
0.1235	102.0	228480	8.4978
0.1213	103.0	230720	8.5041
0.1195	104.0	232960	8.5161
0.1174	105.0	235200	8.5176
0.1147	106.0	237440	8.5268
0.1134	107.0	239680	8.5325
0.1123	108.0	241920	8.5376
0.1101	109.0	244160	8.5404
0.1082	110.0	246400	8.5439
0.1083	111.0	248640	8.5434
0.1069	111.6071	250000	8.5448

Framework versions

Transformers 4.44.2
Pytorch 2.4.0+cu121
Datasets 3.0.0
Tokenizers 0.19.1

SBYYB
/

gpt2-wikitext2-LONG

gpt2-wikitext2-LONG

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Model tree for SBYYB/gpt2-wikitext2-LONG

Evaluation results