dense-wmt16-tr-en-dense-ba128-lr1e-04

This model is a fine-tuned version of google-t5/t5-base on the wmt16 tr-en dataset. It achieves the following results on the evaluation set:

Loss: 2.1616
Bleu: 18.257
Gen Len: 25.2238
Num Experts Activated: 0

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 0.0001
train_batch_size: 32
eval_batch_size: 32
seed: 42
gradient_accumulation_steps: 4
total_train_batch_size: 128
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: constant_with_warmup
lr_scheduler_warmup_steps: 200
num_epochs: 40.0

Training results

Training Loss	Epoch	Step	Validation Loss	Bleu	Gen Len
2.9818	0.6221	1000	3.5812	2.2511	30.8611
2.5394	1.2442	2000	3.2778	4.3464	27.5924
2.2711	1.8663	3000	3.0610	5.4823	26.5624
2.0787	2.4883	4000	2.8963	6.9788	26.7413
1.9373	3.1104	5000	2.7878	8.1811	26.5524
1.8533	3.7325	6000	2.7018	9.052	26.4436
1.7677	4.3546	7000	2.6285	9.9558	25.7802
1.6856	4.9767	8000	2.5612	10.4238	25.5594
1.6259	5.5988	9000	2.5210	11.3462	25.5125
1.5572	6.2208	10000	2.4716	11.8849	25.5824
1.5318	6.8429	11000	2.4306	12.2949	25.6234
1.4808	7.4650	12000	2.3984	12.7519	25.6454
1.429	8.0871	13000	2.3835	12.6427	25.4875
1.4258	8.7092	14000	2.3461	13.4165	25.9011
1.3613	9.3313	15000	2.3228	13.8046	25.5145
1.3738	9.9533	16000	2.3098	13.964	25.0729
1.3347	10.5754	17000	2.2995	14.1001	24.958
1.2906	11.1975	18000	2.2754	14.4495	25.4895
1.2901	11.8196	19000	2.2635	14.5571	24.8462
1.2377	12.4417	20000	2.2482	14.7131	24.8891
1.2295	13.0638	21000	2.2462	14.8771	25.033
1.2367	13.6858	22000	2.2315	14.8081	24.8971
1.1905	14.3079	23000	2.2252	15.1784	24.7832
1.1948	14.9300	24000	2.2018	15.7014	25.1568
1.1626	15.5521	25000	2.2085	16.0292	25.1009
1.1534	16.1742	26000	2.1955	16.0306	25.1489
1.153	16.7963	27000	2.1895	16.1658	24.975
1.1209	17.4184	28000	2.1841	15.7986	24.9061
1.1125	18.0404	29000	2.1732	16.3828	25.1069
1.1055	18.6625	30000	2.1698	16.0596	25.012
1.0732	19.2846	31000	2.1633	16.3497	24.7393
1.0726	19.9067	32000	2.1605	16.5415	24.8891
1.0663	20.5288	33000	2.1503	16.5621	25.2887
1.0536	21.1509	34000	2.1524	16.9547	25.0929
1.0451	21.7729	35000	2.1442	16.7843	25.0989
1.0336	22.3950	36000	2.1506	16.8701	25.1309
1.02	23.0171	37000	2.1593	16.7293	24.7363
1.0134	23.6392	38000	2.1494	17.0919	24.9271
0.9921	24.2613	39000	2.1516	17.0088	24.9081
1.0015	24.8834	40000	2.1458	17.2168	24.5784
0.9872	25.5054	41000	2.1457	17.5821	24.6603
0.9606	26.1275	42000	2.1465	17.465	25.2268
0.9818	26.7496	43000	2.1457	17.6366	24.7413
0.968	27.3717	44000	2.1462	17.5607	24.6304
0.9718	27.9938	45000	2.1300	17.5879	24.7463
0.9387	28.6159	46000	2.1485	17.3176	24.6324
0.9378	29.2379	47000	2.1448	17.4231	24.6853
0.9329	29.8600	48000	2.1361	17.8358	24.9301
0.9258	30.4821	49000	2.1355	17.9496	24.8012
0.9052	31.1042	50000	2.1469	17.8838	24.9101
0.9188	31.7263	51000	2.1416	17.9982	25.0559
0.897	32.3484	52000	2.1540	18.1229	25.2138
0.9143	32.9705	53000	2.1475	18.0501	25.2408
0.8898	33.5925	54000	2.1499	18.0826	24.8811
0.871	34.2146	55000	2.1546	18.164	24.987
0.8852	34.8367	56000	2.1420	18.1221	24.7772
0.8675	35.4588	57000	2.1600	17.861	24.8002
0.852	36.0809	58000	2.1680	18.0484	25.04
0.8745	36.7030	59000	2.1604	17.7634	24.7872
0.8466	37.3250	60000	2.1570	17.8774	25.1229
0.8514	37.9471	61000	2.1583	18.0103	24.8362
0.8394	38.5692	62000	2.1599	18.4043	24.8432
0.8245	39.1913	63000	2.1687	18.2631	25.0639
0.8433	39.8134	64000	2.1580	18.2601	25.2617

Framework versions

Transformers 4.44.0
Pytorch 2.4.0+cu121
Datasets 2.21.0
Tokenizers 0.19.1

taehyunzzz
/

dense-wmt16-tr-en-dense-ba128-lr1e-04

dense-wmt16-tr-en-dense-ba128-lr1e-04

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Model tree for taehyunzzz/dense-wmt16-tr-en-dense-ba128-lr1e-04

Dataset used to train taehyunzzz/dense-wmt16-tr-en-dense-ba128-lr1e-04

Evaluation results