dense-wmt16-tr-en-dense-ba128-lr1e-04
This model is a fine-tuned version of google-t5/t5-base on the wmt16 tr-en dataset. It achieves the following results on the evaluation set:
- Loss: 2.1616
- Bleu: 18.257
- Gen Len: 25.2238
- Num Experts Activated: 0
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 0.0001
- train_batch_size: 32
- eval_batch_size: 32
- seed: 42
- gradient_accumulation_steps: 4
- total_train_batch_size: 128
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: constant_with_warmup
- lr_scheduler_warmup_steps: 200
- num_epochs: 40.0
Training results
Training Loss | Epoch | Step | Validation Loss | Bleu | Gen Len | Experts Activated |
---|---|---|---|---|---|---|
2.9818 | 0.6221 | 1000 | 3.5812 | 2.2511 | 30.8611 | 0 |
2.5394 | 1.2442 | 2000 | 3.2778 | 4.3464 | 27.5924 | 0 |
2.2711 | 1.8663 | 3000 | 3.0610 | 5.4823 | 26.5624 | 0 |
2.0787 | 2.4883 | 4000 | 2.8963 | 6.9788 | 26.7413 | 0 |
1.9373 | 3.1104 | 5000 | 2.7878 | 8.1811 | 26.5524 | 0 |
1.8533 | 3.7325 | 6000 | 2.7018 | 9.052 | 26.4436 | 0 |
1.7677 | 4.3546 | 7000 | 2.6285 | 9.9558 | 25.7802 | 0 |
1.6856 | 4.9767 | 8000 | 2.5612 | 10.4238 | 25.5594 | 0 |
1.6259 | 5.5988 | 9000 | 2.5210 | 11.3462 | 25.5125 | 0 |
1.5572 | 6.2208 | 10000 | 2.4716 | 11.8849 | 25.5824 | 0 |
1.5318 | 6.8429 | 11000 | 2.4306 | 12.2949 | 25.6234 | 0 |
1.4808 | 7.4650 | 12000 | 2.3984 | 12.7519 | 25.6454 | 0 |
1.429 | 8.0871 | 13000 | 2.3835 | 12.6427 | 25.4875 | 0 |
1.4258 | 8.7092 | 14000 | 2.3461 | 13.4165 | 25.9011 | 0 |
1.3613 | 9.3313 | 15000 | 2.3228 | 13.8046 | 25.5145 | 0 |
1.3738 | 9.9533 | 16000 | 2.3098 | 13.964 | 25.0729 | 0 |
1.3347 | 10.5754 | 17000 | 2.2995 | 14.1001 | 24.958 | 0 |
1.2906 | 11.1975 | 18000 | 2.2754 | 14.4495 | 25.4895 | 0 |
1.2901 | 11.8196 | 19000 | 2.2635 | 14.5571 | 24.8462 | 0 |
1.2377 | 12.4417 | 20000 | 2.2482 | 14.7131 | 24.8891 | 0 |
1.2295 | 13.0638 | 21000 | 2.2462 | 14.8771 | 25.033 | 0 |
1.2367 | 13.6858 | 22000 | 2.2315 | 14.8081 | 24.8971 | 0 |
1.1905 | 14.3079 | 23000 | 2.2252 | 15.1784 | 24.7832 | 0 |
1.1948 | 14.9300 | 24000 | 2.2018 | 15.7014 | 25.1568 | 0 |
1.1626 | 15.5521 | 25000 | 2.2085 | 16.0292 | 25.1009 | 0 |
1.1534 | 16.1742 | 26000 | 2.1955 | 16.0306 | 25.1489 | 0 |
1.153 | 16.7963 | 27000 | 2.1895 | 16.1658 | 24.975 | 0 |
1.1209 | 17.4184 | 28000 | 2.1841 | 15.7986 | 24.9061 | 0 |
1.1125 | 18.0404 | 29000 | 2.1732 | 16.3828 | 25.1069 | 0 |
1.1055 | 18.6625 | 30000 | 2.1698 | 16.0596 | 25.012 | 0 |
1.0732 | 19.2846 | 31000 | 2.1633 | 16.3497 | 24.7393 | 0 |
1.0726 | 19.9067 | 32000 | 2.1605 | 16.5415 | 24.8891 | 0 |
1.0663 | 20.5288 | 33000 | 2.1503 | 16.5621 | 25.2887 | 0 |
1.0536 | 21.1509 | 34000 | 2.1524 | 16.9547 | 25.0929 | 0 |
1.0451 | 21.7729 | 35000 | 2.1442 | 16.7843 | 25.0989 | 0 |
1.0336 | 22.3950 | 36000 | 2.1506 | 16.8701 | 25.1309 | 0 |
1.02 | 23.0171 | 37000 | 2.1593 | 16.7293 | 24.7363 | 0 |
1.0134 | 23.6392 | 38000 | 2.1494 | 17.0919 | 24.9271 | 0 |
0.9921 | 24.2613 | 39000 | 2.1516 | 17.0088 | 24.9081 | 0 |
1.0015 | 24.8834 | 40000 | 2.1458 | 17.2168 | 24.5784 | 0 |
0.9872 | 25.5054 | 41000 | 2.1457 | 17.5821 | 24.6603 | 0 |
0.9606 | 26.1275 | 42000 | 2.1465 | 17.465 | 25.2268 | 0 |
0.9818 | 26.7496 | 43000 | 2.1457 | 17.6366 | 24.7413 | 0 |
0.968 | 27.3717 | 44000 | 2.1462 | 17.5607 | 24.6304 | 0 |
0.9718 | 27.9938 | 45000 | 2.1300 | 17.5879 | 24.7463 | 0 |
0.9387 | 28.6159 | 46000 | 2.1485 | 17.3176 | 24.6324 | 0 |
0.9378 | 29.2379 | 47000 | 2.1448 | 17.4231 | 24.6853 | 0 |
0.9329 | 29.8600 | 48000 | 2.1361 | 17.8358 | 24.9301 | 0 |
0.9258 | 30.4821 | 49000 | 2.1355 | 17.9496 | 24.8012 | 0 |
0.9052 | 31.1042 | 50000 | 2.1469 | 17.8838 | 24.9101 | 0 |
0.9188 | 31.7263 | 51000 | 2.1416 | 17.9982 | 25.0559 | 0 |
0.897 | 32.3484 | 52000 | 2.1540 | 18.1229 | 25.2138 | 0 |
0.9143 | 32.9705 | 53000 | 2.1475 | 18.0501 | 25.2408 | 0 |
0.8898 | 33.5925 | 54000 | 2.1499 | 18.0826 | 24.8811 | 0 |
0.871 | 34.2146 | 55000 | 2.1546 | 18.164 | 24.987 | 0 |
0.8852 | 34.8367 | 56000 | 2.1420 | 18.1221 | 24.7772 | 0 |
0.8675 | 35.4588 | 57000 | 2.1600 | 17.861 | 24.8002 | 0 |
0.852 | 36.0809 | 58000 | 2.1680 | 18.0484 | 25.04 | 0 |
0.8745 | 36.7030 | 59000 | 2.1604 | 17.7634 | 24.7872 | 0 |
0.8466 | 37.3250 | 60000 | 2.1570 | 17.8774 | 25.1229 | 0 |
0.8514 | 37.9471 | 61000 | 2.1583 | 18.0103 | 24.8362 | 0 |
0.8394 | 38.5692 | 62000 | 2.1599 | 18.4043 | 24.8432 | 0 |
0.8245 | 39.1913 | 63000 | 2.1687 | 18.2631 | 25.0639 | 0 |
0.8433 | 39.8134 | 64000 | 2.1580 | 18.2601 | 25.2617 | 0 |
Framework versions
- Transformers 4.44.0
- Pytorch 2.4.0+cu121
- Datasets 2.21.0
- Tokenizers 0.19.1
- Downloads last month
- 0
Model tree for taehyunzzz/dense-wmt16-tr-en-dense-ba128-lr1e-04
Base model
google-t5/t5-base