gpt_train_12_512

This model is a fine-tuned version of openai-community/gpt2 on the gokuls/wiki_book_corpus_raw_dataset_tiny dataset. It achieves the following results on the evaluation set:

Loss: 8.9141
Accuracy: 0.0917

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 1e-05
train_batch_size: 24
eval_batch_size: 24
seed: 10
distributed_type: multi-GPU
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: linear
num_epochs: 100
mixed_precision_training: Native AMP

Training results

Training Loss	Epoch	Step	Validation Loss	Accuracy
10.8828	0.0000	1	10.8828	0.0001
10.8984	0.0001	2	10.8828	0.0001
10.8906	0.0001	3	10.8828	0.0001
10.8828	0.0001	4	10.8828	0.0001
10.8828	0.0002	5	10.8828	0.0001
10.8828	0.0002	6	10.8828	0.0001
10.8906	0.0003	7	10.8828	0.0001
10.8828	0.0003	8	10.8828	0.0001
10.875	0.0003	9	10.8828	0.0001
10.8984	0.0004	10	10.8828	0.0001
10.8828	0.0004	11	10.8828	0.0001
10.8906	0.0004	12	10.8828	0.0001
10.8828	0.0005	13	10.8828	0.0001
10.8828	0.0005	14	10.8828	0.0001
10.8828	0.0005	15	10.8828	0.0001
10.8828	0.0006	16	10.8828	0.0001
10.875	0.0006	17	10.8828	0.0001
10.8828	0.0007	18	10.6328	0.0197
10.6641	0.0007	19	10.4844	0.0444
10.5078	0.0007	20	10.3828	0.0499
10.3984	0.0008	21	10.3125	0.0532
10.3438	0.0008	22	10.25	0.0550
10.2656	0.0008	23	10.2031	0.0562
10.25	0.0009	24	10.1641	0.0540
10.1875	0.0009	25	10.1328	0.0470
10.125	0.0009	26	10.1094	0.0461
10.125	0.0010	27	10.0859	0.0480
10.0938	0.0010	28	10.0703	0.0474
10.0625	0.0011	29	10.0547	0.0465
10.0703	0.0011	30	10.0391	0.0472
10.0156	0.0011	31	10.0234	0.0515
10.0859	0.0012	32	10.0156	0.0587
9.9922	0.0012	33	10.0078	0.0613
10.0234	0.0012	34	9.9922	0.0608
9.9609	0.0013	35	9.9844	0.0600
10.0391	0.0013	36	9.9766	0.0608
9.9922	0.0013	37	9.9609	0.0619
9.9688	0.0014	38	9.9531	0.0623
9.9453	0.0014	39	9.9375	0.0622
9.9609	0.0015	40	9.9297	0.0628
9.9609	0.0015	41	9.9141	0.0640
10.0234	0.0015	42	9.8984	0.0649
9.9375	0.0016	43	9.8906	0.0648
9.8516	0.0016	44	9.875	0.0644
9.8672	0.0016	45	9.8594	0.0643
9.8984	0.0017	46	9.8438	0.0643
9.875	0.0017	47	9.8359	0.0645
9.8672	0.0017	48	9.8203	0.0646
9.8984	0.0018	49	9.8125	0.0649
9.7891	0.0018	50	9.8047	0.0653
9.8281	0.0019	51	9.7891	0.0655
9.8281	0.0019	52	9.7812	0.0654
9.7969	0.0019	53	9.7734	0.0660
9.7812	0.0020	54	9.7656	0.0670
9.8047	0.0020	55	9.75	0.0682
9.7969	0.0020	56	9.7422	0.0688
9.7891	0.0021	57	9.7344	0.0691
9.6875	0.0021	58	9.7266	0.0690
9.7188	0.0021	59	9.7188	0.0686
9.7344	0.0022	60	9.7109	0.0682
9.7344	0.0022	61	9.6953	0.0687
9.7578	0.0023	62	9.6875	0.0697
9.6484	0.0023	63	9.6719	0.0708
9.6328	0.0023	64	9.6641	0.0715
9.7656	0.0024	65	9.6562	0.0721
9.6875	0.0024	66	9.6484	0.0725
9.6328	0.0024	67	9.6406	0.0727
9.6953	0.0025	68	9.6328	0.0734
9.7188	0.0025	69	9.625	0.0744
9.6875	0.0025	70	9.6172	0.0753
9.625	0.0026	71	9.6094	0.0763
9.6172	0.0026	72	9.6016	0.0769
9.6016	0.0027	73	9.5938	0.0771
9.6094	0.0027	74	9.5859	0.0771
9.5859	0.0027	75	9.5781	0.0771
9.5859	0.0028	76	9.5703	0.0767
9.5859	0.0028	77	9.5625	0.0765
9.5781	0.0028	78	9.5547	0.0764
9.6172	0.0029	79	9.5469	0.0763
9.5859	0.0029	80	9.5391	0.0768
9.5859	0.0029	81	9.5312	0.0770
9.5391	0.0030	82	9.5234	0.0770
9.5391	0.0030	83	9.5234	0.0764
9.5312	0.0031	84	9.5156	0.0758
9.5547	0.0031	85	9.5078	0.0757
9.5781	0.0031	86	9.5	0.0760
9.5703	0.0032	87	9.4922	0.0764
9.4844	0.0032	88	9.4844	0.0764
9.5312	0.0032	89	9.4766	0.0765
9.5312	0.0033	90	9.4688	0.0765
9.5078	0.0033	91	9.4688	0.0766
9.5	0.0033	92	9.4609	0.0768
9.4844	0.0034	93	9.4531	0.0769
9.4688	0.0034	94	9.4453	0.0773
9.5156	0.0035	95	9.4375	0.0777
9.4453	0.0035	96	9.4297	0.0783
9.4766	0.0035	97	9.4219	0.0794
9.4219	0.0036	98	9.4219	0.0804
9.4531	0.0036	99	9.4141	0.0814
9.4141	0.0036	100	9.4062	0.0819
9.375	0.0037	101	9.3984	0.0825
9.4219	0.0037	102	9.3906	0.0828
9.3828	0.0037	103	9.3828	0.0828
9.375	0.0038	104	9.3828	0.0827
9.3516	0.0038	105	9.375	0.0825
9.3906	0.0039	106	9.3672	0.0825
9.3672	0.0039	107	9.3594	0.0823
9.3359	0.0039	108	9.3516	0.0822
9.4062	0.0040	109	9.3438	0.0818
9.3906	0.0040	110	9.3438	0.0816
9.25	0.0040	111	9.3359	0.0816
9.3281	0.0041	112	9.3281	0.0816
9.375	0.0041	113	9.3203	0.0813
9.3906	0.0041	114	9.3203	0.0812
9.3203	0.0042	115	9.3125	0.0812
9.3125	0.0042	116	9.3047	0.0811
9.3359	0.0043	117	9.2969	0.0809
9.2812	0.0043	118	9.2969	0.0808
9.2031	0.0043	119	9.2891	0.0807
9.2422	0.0044	120	9.2812	0.0808
9.3047	0.0044	121	9.2812	0.0809
9.2969	0.0044	122	9.2734	0.0810
9.25	0.0045	123	9.2656	0.0815
9.3281	0.0045	124	9.2578	0.0825
9.2656	0.0045	125	9.2578	0.0836
9.3047	0.0046	126	9.25	0.0845
9.25	0.0046	127	9.2422	0.0850
9.2969	0.0046	128	9.2344	0.0852
9.3203	0.0047	129	9.2344	0.0853
9.25	0.0047	130	9.2266	0.0853
9.2422	0.0048	131	9.2188	0.0854
9.1641	0.0048	132	9.2109	0.0855
9.2109	0.0048	133	9.2109	0.0858
9.2422	0.0049	134	9.2031	0.0860
9.2188	0.0049	135	9.1953	0.0861
9.3047	0.0049	136	9.1875	0.0861
9.1641	0.0050	137	9.1875	0.0861
9.2188	0.0050	138	9.1797	0.0859
9.2422	0.0050	139	9.1719	0.0856
9.2422	0.0051	140	9.1719	0.0855
9.1484	0.0051	141	9.1641	0.0852
9.2422	0.0052	142	9.1562	0.0851
9.1953	0.0052	143	9.1484	0.0852
9.1641	0.0052	144	9.1484	0.0853
9.1875	0.0053	145	9.1406	0.0854
9.1172	0.0053	146	9.1328	0.0855
9.1094	0.0053	147	9.1328	0.0856
9.1328	0.0054	148	9.125	0.0859
9.1641	0.0054	149	9.1172	0.0863
9.1641	0.0054	150	9.1094	0.0868
9.1875	0.0055	151	9.1094	0.0873
9.2031	0.0055	152	9.1016	0.0875
9.0703	0.0056	153	9.0938	0.0880
9.1484	0.0056	154	9.0859	0.0884
9.0625	0.0056	155	9.0859	0.0888
9.0781	0.0057	156	9.0781	0.0889
9.0234	0.0057	157	9.0703	0.0892
9.0781	0.0057	158	9.0703	0.0894
9.0	0.0058	159	9.0625	0.0895
9.0312	0.0058	160	9.0547	0.0896
9.0391	0.0058	161	9.0547	0.0898
9.0469	0.0059	162	9.0469	0.0901
9.0859	0.0059	163	9.0391	0.0905
9.0078	0.0060	164	9.0312	0.0908
9.0156	0.0060	165	9.0312	0.0909
9.0469	0.0060	166	9.0234	0.0909
8.9219	0.0061	167	9.0234	0.0908
9.0312	0.0061	168	9.0156	0.0907
9.0938	0.0061	169	9.0078	0.0906
9.0156	0.0062	170	9.0	0.0902
9.0312	0.0062	171	9.0	0.0897
9.0625	0.0062	172	8.9922	0.0893
8.9844	0.0063	173	8.9844	0.0891
9.0703	0.0063	174	8.9844	0.0894
8.9609	0.0064	175	8.9766	0.0898
8.9922	0.0064	176	8.9766	0.0905
9.0234	0.0064	177	8.9688	0.0910
9.0234	0.0065	178	8.9609	0.0915
8.9219	0.0065	179	8.9531	0.0919
9.0234	0.0065	180	8.9531	0.0920
8.9375	0.0066	181	8.9453	0.0921
8.9688	0.0066	182	8.9375	0.0919
8.9375	0.0066	183	8.9375	0.0913
9.0	0.0067	184	8.9297	0.0912
8.9375	0.0067	185	8.9219	0.0913
8.9609	0.0068	186	8.9219	0.0913
8.9688	0.0068	187	8.9141	0.0917

Framework versions

Transformers 4.41.2
Pytorch 2.1.0a0+32f93b1
Datasets 2.20.0
Tokenizers 0.19.1

gokulsrinivasagan
/

gpt_train_12_512

gpt_train_12_512

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Model tree for gokulsrinivasagan/gpt_train_12_512

Dataset used to train gokulsrinivasagan/gpt_train_12_512

Evaluation results