Edit model card

dense-wmt16-ro-en-dense-ba128-lr1e-04

This model is a fine-tuned version of google-t5/t5-base on the wmt16 ro-en dataset. It achieves the following results on the evaluation set:

  • Loss: 1.4524
  • Bleu: 35.6217
  • Gen Len: 30.5058
  • Num Experts Activated: 0

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 0.0001
  • train_batch_size: 32
  • eval_batch_size: 32
  • seed: 42
  • gradient_accumulation_steps: 4
  • total_train_batch_size: 128
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: constant_with_warmup
  • lr_scheduler_warmup_steps: 200
  • num_epochs: 25.0

Training results

Training Loss Epoch Step Validation Loss Bleu Gen Len Experts Activated
2.0518 0.2097 1000 2.3752 14.4588 34.6833 0
1.7134 0.4194 2000 2.1375 19.7383 32.8469 0
1.5421 0.6292 3000 2.0154 22.1724 32.065 0
1.4533 0.8389 4000 1.9339 23.9113 31.6648 0
1.356 1.0486 5000 1.8817 25.4135 31.2956 0
1.3158 1.2583 6000 1.8388 26.2982 31.1401 0
1.2612 1.4680 7000 1.7994 27.0511 31.2336 0
1.2221 1.6778 8000 1.7709 27.3163 31.3552 0
1.1928 1.8875 9000 1.7404 27.7464 31.1411 0
1.157 2.0972 10000 1.7188 28.2878 31.3617 0
1.1366 2.3069 11000 1.7030 28.9879 31.1411 0
1.1188 2.5166 12000 1.6837 29.2076 30.982 0
1.1044 2.7264 13000 1.6713 29.604 31.0275 0
1.0928 2.9361 14000 1.6557 30.4158 30.7599 0
1.0588 3.1458 15000 1.6469 30.4293 30.8324 0
1.0513 3.3555 16000 1.6294 30.6678 30.9465 0
1.0474 3.5652 17000 1.6206 30.9433 30.7804 0
1.0277 3.7750 18000 1.6087 31.2454 30.7624 0
1.0295 3.9847 19000 1.5980 31.395 30.9555 0
0.9973 4.1944 20000 1.5904 31.6784 30.6853 0
0.9895 4.4041 21000 1.5891 31.7694 30.7414 0
0.969 4.6139 22000 1.5779 32.0658 30.8014 0
0.9806 4.8236 23000 1.5684 32.1956 30.7899 0
0.9561 5.0333 24000 1.5690 32.3726 30.8169 0
0.9483 5.2430 25000 1.5586 32.4251 30.7609 0
0.945 5.4527 26000 1.5553 32.4857 30.6708 0
0.9528 5.6625 27000 1.5432 32.9461 30.7344 0
0.9346 5.8722 28000 1.5399 33.2043 30.6803 0
0.9101 6.0819 29000 1.5473 33.3657 30.5953 0
0.9205 6.2916 30000 1.5344 33.0577 30.5473 0
0.9097 6.5013 31000 1.5291 33.0098 30.6608 0
0.9099 6.7111 32000 1.5236 33.313 30.7164 0
0.9134 6.9208 33000 1.5117 33.2911 30.7804 0
0.8823 7.1305 34000 1.5218 33.3702 30.7794 0
0.8813 7.3402 35000 1.5100 33.4289 30.5843 0
0.8846 7.5499 36000 1.5147 33.484 30.9595 0
0.8821 7.7597 37000 1.5047 33.5289 30.6363 0
0.8779 7.9694 38000 1.5013 33.7474 30.6833 0
0.8521 8.1791 39000 1.5042 33.5022 30.7334 0
0.8673 8.3888 40000 1.4983 33.81 30.6323 0
0.8627 8.5985 41000 1.5019 33.8729 30.6418 0
0.8552 8.8083 42000 1.4937 33.8221 30.5473 0
0.8466 9.0180 43000 1.4943 33.9123 30.4947 0
0.8464 9.2277 44000 1.4931 33.9585 30.4897 0
0.8522 9.4374 45000 1.4845 33.9454 30.5598 0
0.838 9.6471 46000 1.4877 34.0698 30.5388 0
0.832 9.8569 47000 1.4809 34.1426 30.6333 0
0.8238 10.0666 48000 1.4833 34.5167 30.6518 0
0.8108 10.2763 49000 1.4802 34.1727 30.6933 0
0.8129 10.4860 50000 1.4794 34.4153 30.7564 0
0.8167 10.6957 51000 1.4780 34.3061 30.5958 0
0.8189 10.9055 52000 1.4754 34.4714 30.7639 0
0.8023 11.1152 53000 1.4772 34.5578 30.6433 0
0.806 11.3249 54000 1.4767 34.5027 30.5508 0
0.8142 11.5346 55000 1.4729 34.6337 30.5733 0
0.8049 11.7444 56000 1.4727 34.7169 30.6483 0
0.807 11.9541 57000 1.4676 34.6891 30.5323 0
0.7772 12.1638 58000 1.4738 34.6807 30.6343 0
0.7805 12.3735 59000 1.4716 34.8551 30.5503 0
0.7886 12.5832 60000 1.4724 34.457 30.4512 0
0.7848 12.7930 61000 1.4657 34.6815 30.5603 0
0.7873 13.0027 62000 1.4620 34.9514 30.5888 0
0.7661 13.2124 63000 1.4718 34.6691 30.5038 0
0.7791 13.4221 64000 1.4628 34.883 30.6638 0
0.7669 13.6318 65000 1.4654 34.9584 30.4867 0
0.7702 13.8416 66000 1.4657 35.1764 30.5913 0
0.7632 14.0513 67000 1.4645 35.2597 30.6138 0
0.7564 14.2610 68000 1.4627 35.0536 30.6528 0
0.7681 14.4707 69000 1.4614 35.0135 30.5123 0
0.7718 14.6804 70000 1.4571 34.9493 30.6123 0
0.7702 14.8902 71000 1.4521 35.2306 30.6193 0
0.7521 15.0999 72000 1.4635 35.1928 30.5378 0
0.7598 15.3096 73000 1.4531 35.1121 30.6473 0
0.7534 15.5193 74000 1.4578 35.0046 30.6748 0
0.7512 15.7290 75000 1.4602 35.2306 30.5218 0
0.7514 15.9388 76000 1.4539 34.9417 30.4477 0
0.7369 16.1485 77000 1.4615 34.8797 30.4322 0
0.741 16.3582 78000 1.4604 34.7509 30.5528 0
0.7387 16.5679 79000 1.4568 35.2525 30.4982 0
0.7506 16.7776 80000 1.4543 34.9917 30.4822 0
0.7428 16.9874 81000 1.4541 35.1863 30.4477 0
0.7298 17.1971 82000 1.4651 34.9478 30.3877 0
0.7207 17.4068 83000 1.4551 35.1533 30.6013 0
0.7308 17.6165 84000 1.4563 35.2446 30.6573 0
0.727 17.8262 85000 1.4550 35.2634 30.4762 0
0.7195 18.0360 86000 1.4561 35.5393 30.5128 0
0.7222 18.2457 87000 1.4573 35.346 30.5043 0
0.7243 18.4554 88000 1.4608 35.3484 30.4807 0
0.7305 18.6651 89000 1.4519 35.526 30.5608 0
0.7181 18.8748 90000 1.4497 35.2736 30.4982 0
0.7005 19.0846 91000 1.4602 35.3432 30.6163 0
0.712 19.2943 92000 1.4582 35.305 30.4867 0
0.7119 19.5040 93000 1.4579 35.576 30.5863 0
0.7159 19.7137 94000 1.4533 35.3083 30.6238 0
0.7083 19.9235 95000 1.4537 35.3507 30.4132 0
0.6935 20.1332 96000 1.4581 35.4119 30.5163 0
0.705 20.3429 97000 1.4581 35.0911 30.4507 0
0.7056 20.5526 98000 1.4606 35.5596 30.5103 0
0.6996 20.7623 99000 1.4521 35.4794 30.5513 0
0.7112 20.9721 100000 1.4482 35.5249 30.4902 0
0.6994 21.1818 101000 1.4568 35.5476 30.5103 0
0.6914 21.3915 102000 1.4598 35.4283 30.5663 0
0.7072 21.6012 103000 1.4515 35.5676 30.5013 0
0.6888 21.8109 104000 1.4547 35.4389 30.4072 0
0.6715 22.0207 105000 1.4582 35.4778 30.4932 0
0.6918 22.2304 106000 1.4594 35.7069 30.4467 0
0.6844 22.4401 107000 1.4565 35.6535 30.3877 0
0.6911 22.6498 108000 1.4579 35.894 30.4962 0
0.6934 22.8595 109000 1.4564 35.3773 30.5228 0
0.6728 23.0693 110000 1.4605 35.4285 30.4197 0
0.6749 23.2790 111000 1.4574 35.6055 30.5698 0
0.6767 23.4887 112000 1.4553 35.719 30.5378 0
0.6833 23.6984 113000 1.4630 35.4939 30.4862 0
0.6793 23.9081 114000 1.4526 35.7737 30.4832 0
0.6694 24.1179 115000 1.4619 35.6107 30.4972 0
0.6706 24.3276 116000 1.4659 35.4693 30.4822 0
0.6786 24.5373 117000 1.4604 35.7295 30.5323 0
0.68 24.7470 118000 1.4612 35.5735 30.4437 0
0.6797 24.9567 119000 1.4531 35.668 30.5263 0

Framework versions

  • Transformers 4.44.0
  • Pytorch 2.4.0+cu121
  • Datasets 2.21.0
  • Tokenizers 0.19.1
Downloads last month
0
Safetensors
Model size
124M params
Tensor type
F32
·
Inference API
Unable to determine this model's library. Check the docs .

Model tree for taehyunzzz/dense-wmt16-ro-en-dense-ba128-lr1e-04

Base model

google-t5/t5-base
Finetuned
(353)
this model

Dataset used to train taehyunzzz/dense-wmt16-ro-en-dense-ba128-lr1e-04

Evaluation results