switch-base-32-samsum

This model is a fine-tuned version of google/switch-base-32 on the samsum dataset. It achieves the following results on the evaluation set:

Loss: 1.3830
Rouge1: 48.5521
Rouge2: 25.5283
Rougel: 40.8665
Rougelsum: 44.9575
Gen Len: 16.9144

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 5e-05
train_batch_size: 16
eval_batch_size: 16
seed: 42
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: linear
lr_scheduler_warmup_ratio: 0.1
num_epochs: 5.0

Training results

Training Loss	Epoch	Step	Validation Loss	Rouge1	Rouge2	Rougel	Rougelsum	Gen Len
13.4215	0.1086	100	11.0472	20.1278	5.2521	17.5417	19.1564	18.5807
2.8392	0.2172	200	2.1007	38.3594	16.281	32.2365	35.4802	16.5599
2.4215	0.3257	300	1.7960	42.1238	19.355	35.3645	39.2556	16.2677
2.122	0.4343	400	1.6754	43.7744	20.6979	36.6416	40.6431	17.3716
2.0046	0.5429	500	1.5964	44.1887	21.1957	36.8047	40.8905	16.7249
1.9988	0.6515	600	1.5513	45.6737	21.9662	38.0672	42.3237	17.0293
1.868	0.7600	700	1.5133	45.549	21.791	37.9979	42.1384	16.2249
1.7934	0.8686	800	1.4904	45.6877	22.6099	38.4701	42.2678	16.2335
1.8638	0.9772	900	1.4783	46.2036	23.2629	39.2818	43.0232	16.2555
1.6739	1.0858	1000	1.4597	46.4896	23.2284	39.6004	43.1073	16.2335
1.6511	1.1944	1100	1.4717	46.3555	23.2062	39.0139	43.0476	17.0636
1.7472	1.3029	1200	1.4456	46.8039	23.0325	39.3688	43.267	16.9169
1.6646	1.4115	1300	1.4474	46.9795	23.8693	40.0189	43.5672	16.4095
1.7575	1.5201	1400	1.4313	47.0233	23.2824	39.4242	43.4246	17.1039
1.6169	1.6287	1500	1.4282	47.2462	23.6695	39.6043	43.575	16.6883
1.6276	1.7372	1600	1.4179	47.5435	24.1485	40.2526	44.2173	16.3386
1.5724	1.8458	1700	1.4148	47.709	24.1513	40.3054	44.3152	16.8716
1.6417	1.9544	1800	1.4070	47.711	24.3763	40.4776	44.1524	17.099
1.4839	2.0630	1900	1.4223	47.6921	24.5385	40.5104	44.2406	16.4535
1.4515	2.1716	2000	1.4060	48.0411	24.8227	40.9466	44.5028	16.6675
1.4827	2.2801	2100	1.4066	47.7	24.3622	40.2299	44.1456	17.0183
1.4776	2.3887	2200	1.4066	47.9768	24.7871	40.7986	44.5597	16.8178
1.4776	2.4973	2300	1.4017	47.9306	24.6758	40.4826	44.4696	17.2176
1.5189	2.6059	2400	1.4000	47.422	24.3336	40.0832	43.9033	16.5281
1.5369	2.7144	2500	1.3910	47.9702	24.7618	40.5049	44.4661	16.9046
1.4754	2.8230	2600	1.3915	48.0885	25.0111	41.0073	44.5462	16.3215
1.4609	2.9316	2700	1.3796	48.2953	25.1084	40.8045	44.8141	16.6883
1.2852	3.0402	2800	1.3914	48.1816	24.9564	40.4874	44.4959	16.6809
1.3426	3.1488	2900	1.3925	47.9864	25.1931	40.6587	44.3335	16.7457
1.342	3.2573	3000	1.3907	47.9714	25.0598	40.7272	44.4796	16.6663
1.3408	3.3659	3100	1.3876	47.9041	24.8444	40.4734	44.1852	17.0917
1.3964	3.4745	3200	1.3831	48.244	25.3169	40.7608	44.6435	16.846
1.2923	3.5831	3300	1.3872	48.1798	25.031	40.7752	44.7031	17.1149
1.3557	3.6916	3400	1.3797	48.4681	25.1391	40.7846	44.9196	16.8924
1.3749	3.8002	3500	1.3799	48.2949	25.3223	40.6975	44.7215	17.1785
1.3232	3.9088	3600	1.3761	48.2852	25.0934	40.7396	44.6782	16.8643
1.2519	4.0174	3700	1.3756	47.8744	24.8648	40.4524	44.4635	16.8631
1.1997	4.1260	3800	1.3859	48.6158	25.5093	41.1598	45.2168	16.9132
1.2544	4.2345	3900	1.3837	48.492	25.1007	40.7921	44.8931	17.0538
1.2808	4.3431	4000	1.3825	48.5394	25.5808	40.9153	44.9679	16.912
1.2971	4.4517	4100	1.3844	48.5203	25.4213	41.0222	45.0464	16.923
1.2563	4.5603	4200	1.3842	48.5428	25.7257	41.2674	45.0936	16.7531
1.2324	4.6688	4300	1.3828	48.6838	25.797	41.216	45.1151	16.8362
1.3399	4.7774	4400	1.3831	48.5336	25.5641	40.8484	44.9315	16.9523
1.3147	4.8860	4500	1.3823	48.5021	25.4093	40.8773	44.8717	16.8851
1.2837	4.9946	4600	1.3830	48.5521	25.5283	40.8665	44.9575	16.9144

Framework versions

Transformers 4.41.2
Pytorch 2.2.0
Datasets 2.14.5
Tokenizers 0.19.1

taehyunzzz
/

switch-base-32-samsum

switch-base-32-samsum

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Finetuned from

Dataset used to train taehyunzzz/switch-base-32-samsum

Evaluation results

switch-base-32-samsum

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Finetuned from google/switch-base-32

Dataset used to train taehyunzzz/switch-base-32-samsum

Evaluation results

Finetuned from