Edit model card

dense-wmt16-tr-en-dense-ba128-lr1e-04

This model is a fine-tuned version of google-t5/t5-base on the wmt16 tr-en dataset. It achieves the following results on the evaluation set:

  • Loss: 2.1616
  • Bleu: 18.257
  • Gen Len: 25.2238
  • Num Experts Activated: 0

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 0.0001
  • train_batch_size: 32
  • eval_batch_size: 32
  • seed: 42
  • gradient_accumulation_steps: 4
  • total_train_batch_size: 128
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: constant_with_warmup
  • lr_scheduler_warmup_steps: 200
  • num_epochs: 40.0

Training results

Training Loss Epoch Step Validation Loss Bleu Gen Len Experts Activated
2.9818 0.6221 1000 3.5812 2.2511 30.8611 0
2.5394 1.2442 2000 3.2778 4.3464 27.5924 0
2.2711 1.8663 3000 3.0610 5.4823 26.5624 0
2.0787 2.4883 4000 2.8963 6.9788 26.7413 0
1.9373 3.1104 5000 2.7878 8.1811 26.5524 0
1.8533 3.7325 6000 2.7018 9.052 26.4436 0
1.7677 4.3546 7000 2.6285 9.9558 25.7802 0
1.6856 4.9767 8000 2.5612 10.4238 25.5594 0
1.6259 5.5988 9000 2.5210 11.3462 25.5125 0
1.5572 6.2208 10000 2.4716 11.8849 25.5824 0
1.5318 6.8429 11000 2.4306 12.2949 25.6234 0
1.4808 7.4650 12000 2.3984 12.7519 25.6454 0
1.429 8.0871 13000 2.3835 12.6427 25.4875 0
1.4258 8.7092 14000 2.3461 13.4165 25.9011 0
1.3613 9.3313 15000 2.3228 13.8046 25.5145 0
1.3738 9.9533 16000 2.3098 13.964 25.0729 0
1.3347 10.5754 17000 2.2995 14.1001 24.958 0
1.2906 11.1975 18000 2.2754 14.4495 25.4895 0
1.2901 11.8196 19000 2.2635 14.5571 24.8462 0
1.2377 12.4417 20000 2.2482 14.7131 24.8891 0
1.2295 13.0638 21000 2.2462 14.8771 25.033 0
1.2367 13.6858 22000 2.2315 14.8081 24.8971 0
1.1905 14.3079 23000 2.2252 15.1784 24.7832 0
1.1948 14.9300 24000 2.2018 15.7014 25.1568 0
1.1626 15.5521 25000 2.2085 16.0292 25.1009 0
1.1534 16.1742 26000 2.1955 16.0306 25.1489 0
1.153 16.7963 27000 2.1895 16.1658 24.975 0
1.1209 17.4184 28000 2.1841 15.7986 24.9061 0
1.1125 18.0404 29000 2.1732 16.3828 25.1069 0
1.1055 18.6625 30000 2.1698 16.0596 25.012 0
1.0732 19.2846 31000 2.1633 16.3497 24.7393 0
1.0726 19.9067 32000 2.1605 16.5415 24.8891 0
1.0663 20.5288 33000 2.1503 16.5621 25.2887 0
1.0536 21.1509 34000 2.1524 16.9547 25.0929 0
1.0451 21.7729 35000 2.1442 16.7843 25.0989 0
1.0336 22.3950 36000 2.1506 16.8701 25.1309 0
1.02 23.0171 37000 2.1593 16.7293 24.7363 0
1.0134 23.6392 38000 2.1494 17.0919 24.9271 0
0.9921 24.2613 39000 2.1516 17.0088 24.9081 0
1.0015 24.8834 40000 2.1458 17.2168 24.5784 0
0.9872 25.5054 41000 2.1457 17.5821 24.6603 0
0.9606 26.1275 42000 2.1465 17.465 25.2268 0
0.9818 26.7496 43000 2.1457 17.6366 24.7413 0
0.968 27.3717 44000 2.1462 17.5607 24.6304 0
0.9718 27.9938 45000 2.1300 17.5879 24.7463 0
0.9387 28.6159 46000 2.1485 17.3176 24.6324 0
0.9378 29.2379 47000 2.1448 17.4231 24.6853 0
0.9329 29.8600 48000 2.1361 17.8358 24.9301 0
0.9258 30.4821 49000 2.1355 17.9496 24.8012 0
0.9052 31.1042 50000 2.1469 17.8838 24.9101 0
0.9188 31.7263 51000 2.1416 17.9982 25.0559 0
0.897 32.3484 52000 2.1540 18.1229 25.2138 0
0.9143 32.9705 53000 2.1475 18.0501 25.2408 0
0.8898 33.5925 54000 2.1499 18.0826 24.8811 0
0.871 34.2146 55000 2.1546 18.164 24.987 0
0.8852 34.8367 56000 2.1420 18.1221 24.7772 0
0.8675 35.4588 57000 2.1600 17.861 24.8002 0
0.852 36.0809 58000 2.1680 18.0484 25.04 0
0.8745 36.7030 59000 2.1604 17.7634 24.7872 0
0.8466 37.3250 60000 2.1570 17.8774 25.1229 0
0.8514 37.9471 61000 2.1583 18.0103 24.8362 0
0.8394 38.5692 62000 2.1599 18.4043 24.8432 0
0.8245 39.1913 63000 2.1687 18.2631 25.0639 0
0.8433 39.8134 64000 2.1580 18.2601 25.2617 0

Framework versions

  • Transformers 4.44.0
  • Pytorch 2.4.0+cu121
  • Datasets 2.21.0
  • Tokenizers 0.19.1
Downloads last month
0
Safetensors
Model size
124M params
Tensor type
F32
·
Inference API
Unable to determine this model's library. Check the docs .

Model tree for taehyunzzz/dense-wmt16-tr-en-dense-ba128-lr1e-04

Base model

google-t5/t5-base
Finetuned
(353)
this model

Dataset used to train taehyunzzz/dense-wmt16-tr-en-dense-ba128-lr1e-04

Evaluation results