--- language: - ro - en license: apache-2.0 base_model: google-t5/t5-base tags: - generated_from_trainer datasets: - wmt16 metrics: - bleu model-index: - name: dense-wmt16-ro-en-dense-ba128-lr1e-04 results: - task: name: Translation type: translation dataset: name: wmt16 ro-en type: wmt16 args: ro-en metrics: - name: Bleu type: bleu value: 35.6217 --- # dense-wmt16-ro-en-dense-ba128-lr1e-04 This model is a fine-tuned version of [google-t5/t5-base](https://huggingface.co/google-t5/t5-base) on the wmt16 ro-en dataset. It achieves the following results on the evaluation set: - Loss: 1.4524 - Bleu: 35.6217 - Gen Len: 30.5058 - Num Experts Activated: 0 ## Model description More information needed ## Intended uses & limitations More information needed ## Training and evaluation data More information needed ## Training procedure ### Training hyperparameters The following hyperparameters were used during training: - learning_rate: 0.0001 - train_batch_size: 32 - eval_batch_size: 32 - seed: 42 - gradient_accumulation_steps: 4 - total_train_batch_size: 128 - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08 - lr_scheduler_type: constant_with_warmup - lr_scheduler_warmup_steps: 200 - num_epochs: 25.0 ### Training results | Training Loss | Epoch | Step | Validation Loss | Bleu | Gen Len | Experts Activated | |:-------------:|:-------:|:------:|:---------------:|:-------:|:-------:|:-----------------:| | 2.0518 | 0.2097 | 1000 | 2.3752 | 14.4588 | 34.6833 | 0 | | 1.7134 | 0.4194 | 2000 | 2.1375 | 19.7383 | 32.8469 | 0 | | 1.5421 | 0.6292 | 3000 | 2.0154 | 22.1724 | 32.065 | 0 | | 1.4533 | 0.8389 | 4000 | 1.9339 | 23.9113 | 31.6648 | 0 | | 1.356 | 1.0486 | 5000 | 1.8817 | 25.4135 | 31.2956 | 0 | | 1.3158 | 1.2583 | 6000 | 1.8388 | 26.2982 | 31.1401 | 0 | | 1.2612 | 1.4680 | 7000 | 1.7994 | 27.0511 | 31.2336 | 0 | | 1.2221 | 1.6778 | 8000 | 1.7709 | 27.3163 | 31.3552 | 0 | | 1.1928 | 1.8875 | 9000 | 1.7404 | 27.7464 | 31.1411 | 0 | | 1.157 | 2.0972 | 10000 | 1.7188 | 28.2878 | 31.3617 | 0 | | 1.1366 | 2.3069 | 11000 | 1.7030 | 28.9879 | 31.1411 | 0 | | 1.1188 | 2.5166 | 12000 | 1.6837 | 29.2076 | 30.982 | 0 | | 1.1044 | 2.7264 | 13000 | 1.6713 | 29.604 | 31.0275 | 0 | | 1.0928 | 2.9361 | 14000 | 1.6557 | 30.4158 | 30.7599 | 0 | | 1.0588 | 3.1458 | 15000 | 1.6469 | 30.4293 | 30.8324 | 0 | | 1.0513 | 3.3555 | 16000 | 1.6294 | 30.6678 | 30.9465 | 0 | | 1.0474 | 3.5652 | 17000 | 1.6206 | 30.9433 | 30.7804 | 0 | | 1.0277 | 3.7750 | 18000 | 1.6087 | 31.2454 | 30.7624 | 0 | | 1.0295 | 3.9847 | 19000 | 1.5980 | 31.395 | 30.9555 | 0 | | 0.9973 | 4.1944 | 20000 | 1.5904 | 31.6784 | 30.6853 | 0 | | 0.9895 | 4.4041 | 21000 | 1.5891 | 31.7694 | 30.7414 | 0 | | 0.969 | 4.6139 | 22000 | 1.5779 | 32.0658 | 30.8014 | 0 | | 0.9806 | 4.8236 | 23000 | 1.5684 | 32.1956 | 30.7899 | 0 | | 0.9561 | 5.0333 | 24000 | 1.5690 | 32.3726 | 30.8169 | 0 | | 0.9483 | 5.2430 | 25000 | 1.5586 | 32.4251 | 30.7609 | 0 | | 0.945 | 5.4527 | 26000 | 1.5553 | 32.4857 | 30.6708 | 0 | | 0.9528 | 5.6625 | 27000 | 1.5432 | 32.9461 | 30.7344 | 0 | | 0.9346 | 5.8722 | 28000 | 1.5399 | 33.2043 | 30.6803 | 0 | | 0.9101 | 6.0819 | 29000 | 1.5473 | 33.3657 | 30.5953 | 0 | | 0.9205 | 6.2916 | 30000 | 1.5344 | 33.0577 | 30.5473 | 0 | | 0.9097 | 6.5013 | 31000 | 1.5291 | 33.0098 | 30.6608 | 0 | | 0.9099 | 6.7111 | 32000 | 1.5236 | 33.313 | 30.7164 | 0 | | 0.9134 | 6.9208 | 33000 | 1.5117 | 33.2911 | 30.7804 | 0 | | 0.8823 | 7.1305 | 34000 | 1.5218 | 33.3702 | 30.7794 | 0 | | 0.8813 | 7.3402 | 35000 | 1.5100 | 33.4289 | 30.5843 | 0 | | 0.8846 | 7.5499 | 36000 | 1.5147 | 33.484 | 30.9595 | 0 | | 0.8821 | 7.7597 | 37000 | 1.5047 | 33.5289 | 30.6363 | 0 | | 0.8779 | 7.9694 | 38000 | 1.5013 | 33.7474 | 30.6833 | 0 | | 0.8521 | 8.1791 | 39000 | 1.5042 | 33.5022 | 30.7334 | 0 | | 0.8673 | 8.3888 | 40000 | 1.4983 | 33.81 | 30.6323 | 0 | | 0.8627 | 8.5985 | 41000 | 1.5019 | 33.8729 | 30.6418 | 0 | | 0.8552 | 8.8083 | 42000 | 1.4937 | 33.8221 | 30.5473 | 0 | | 0.8466 | 9.0180 | 43000 | 1.4943 | 33.9123 | 30.4947 | 0 | | 0.8464 | 9.2277 | 44000 | 1.4931 | 33.9585 | 30.4897 | 0 | | 0.8522 | 9.4374 | 45000 | 1.4845 | 33.9454 | 30.5598 | 0 | | 0.838 | 9.6471 | 46000 | 1.4877 | 34.0698 | 30.5388 | 0 | | 0.832 | 9.8569 | 47000 | 1.4809 | 34.1426 | 30.6333 | 0 | | 0.8238 | 10.0666 | 48000 | 1.4833 | 34.5167 | 30.6518 | 0 | | 0.8108 | 10.2763 | 49000 | 1.4802 | 34.1727 | 30.6933 | 0 | | 0.8129 | 10.4860 | 50000 | 1.4794 | 34.4153 | 30.7564 | 0 | | 0.8167 | 10.6957 | 51000 | 1.4780 | 34.3061 | 30.5958 | 0 | | 0.8189 | 10.9055 | 52000 | 1.4754 | 34.4714 | 30.7639 | 0 | | 0.8023 | 11.1152 | 53000 | 1.4772 | 34.5578 | 30.6433 | 0 | | 0.806 | 11.3249 | 54000 | 1.4767 | 34.5027 | 30.5508 | 0 | | 0.8142 | 11.5346 | 55000 | 1.4729 | 34.6337 | 30.5733 | 0 | | 0.8049 | 11.7444 | 56000 | 1.4727 | 34.7169 | 30.6483 | 0 | | 0.807 | 11.9541 | 57000 | 1.4676 | 34.6891 | 30.5323 | 0 | | 0.7772 | 12.1638 | 58000 | 1.4738 | 34.6807 | 30.6343 | 0 | | 0.7805 | 12.3735 | 59000 | 1.4716 | 34.8551 | 30.5503 | 0 | | 0.7886 | 12.5832 | 60000 | 1.4724 | 34.457 | 30.4512 | 0 | | 0.7848 | 12.7930 | 61000 | 1.4657 | 34.6815 | 30.5603 | 0 | | 0.7873 | 13.0027 | 62000 | 1.4620 | 34.9514 | 30.5888 | 0 | | 0.7661 | 13.2124 | 63000 | 1.4718 | 34.6691 | 30.5038 | 0 | | 0.7791 | 13.4221 | 64000 | 1.4628 | 34.883 | 30.6638 | 0 | | 0.7669 | 13.6318 | 65000 | 1.4654 | 34.9584 | 30.4867 | 0 | | 0.7702 | 13.8416 | 66000 | 1.4657 | 35.1764 | 30.5913 | 0 | | 0.7632 | 14.0513 | 67000 | 1.4645 | 35.2597 | 30.6138 | 0 | | 0.7564 | 14.2610 | 68000 | 1.4627 | 35.0536 | 30.6528 | 0 | | 0.7681 | 14.4707 | 69000 | 1.4614 | 35.0135 | 30.5123 | 0 | | 0.7718 | 14.6804 | 70000 | 1.4571 | 34.9493 | 30.6123 | 0 | | 0.7702 | 14.8902 | 71000 | 1.4521 | 35.2306 | 30.6193 | 0 | | 0.7521 | 15.0999 | 72000 | 1.4635 | 35.1928 | 30.5378 | 0 | | 0.7598 | 15.3096 | 73000 | 1.4531 | 35.1121 | 30.6473 | 0 | | 0.7534 | 15.5193 | 74000 | 1.4578 | 35.0046 | 30.6748 | 0 | | 0.7512 | 15.7290 | 75000 | 1.4602 | 35.2306 | 30.5218 | 0 | | 0.7514 | 15.9388 | 76000 | 1.4539 | 34.9417 | 30.4477 | 0 | | 0.7369 | 16.1485 | 77000 | 1.4615 | 34.8797 | 30.4322 | 0 | | 0.741 | 16.3582 | 78000 | 1.4604 | 34.7509 | 30.5528 | 0 | | 0.7387 | 16.5679 | 79000 | 1.4568 | 35.2525 | 30.4982 | 0 | | 0.7506 | 16.7776 | 80000 | 1.4543 | 34.9917 | 30.4822 | 0 | | 0.7428 | 16.9874 | 81000 | 1.4541 | 35.1863 | 30.4477 | 0 | | 0.7298 | 17.1971 | 82000 | 1.4651 | 34.9478 | 30.3877 | 0 | | 0.7207 | 17.4068 | 83000 | 1.4551 | 35.1533 | 30.6013 | 0 | | 0.7308 | 17.6165 | 84000 | 1.4563 | 35.2446 | 30.6573 | 0 | | 0.727 | 17.8262 | 85000 | 1.4550 | 35.2634 | 30.4762 | 0 | | 0.7195 | 18.0360 | 86000 | 1.4561 | 35.5393 | 30.5128 | 0 | | 0.7222 | 18.2457 | 87000 | 1.4573 | 35.346 | 30.5043 | 0 | | 0.7243 | 18.4554 | 88000 | 1.4608 | 35.3484 | 30.4807 | 0 | | 0.7305 | 18.6651 | 89000 | 1.4519 | 35.526 | 30.5608 | 0 | | 0.7181 | 18.8748 | 90000 | 1.4497 | 35.2736 | 30.4982 | 0 | | 0.7005 | 19.0846 | 91000 | 1.4602 | 35.3432 | 30.6163 | 0 | | 0.712 | 19.2943 | 92000 | 1.4582 | 35.305 | 30.4867 | 0 | | 0.7119 | 19.5040 | 93000 | 1.4579 | 35.576 | 30.5863 | 0 | | 0.7159 | 19.7137 | 94000 | 1.4533 | 35.3083 | 30.6238 | 0 | | 0.7083 | 19.9235 | 95000 | 1.4537 | 35.3507 | 30.4132 | 0 | | 0.6935 | 20.1332 | 96000 | 1.4581 | 35.4119 | 30.5163 | 0 | | 0.705 | 20.3429 | 97000 | 1.4581 | 35.0911 | 30.4507 | 0 | | 0.7056 | 20.5526 | 98000 | 1.4606 | 35.5596 | 30.5103 | 0 | | 0.6996 | 20.7623 | 99000 | 1.4521 | 35.4794 | 30.5513 | 0 | | 0.7112 | 20.9721 | 100000 | 1.4482 | 35.5249 | 30.4902 | 0 | | 0.6994 | 21.1818 | 101000 | 1.4568 | 35.5476 | 30.5103 | 0 | | 0.6914 | 21.3915 | 102000 | 1.4598 | 35.4283 | 30.5663 | 0 | | 0.7072 | 21.6012 | 103000 | 1.4515 | 35.5676 | 30.5013 | 0 | | 0.6888 | 21.8109 | 104000 | 1.4547 | 35.4389 | 30.4072 | 0 | | 0.6715 | 22.0207 | 105000 | 1.4582 | 35.4778 | 30.4932 | 0 | | 0.6918 | 22.2304 | 106000 | 1.4594 | 35.7069 | 30.4467 | 0 | | 0.6844 | 22.4401 | 107000 | 1.4565 | 35.6535 | 30.3877 | 0 | | 0.6911 | 22.6498 | 108000 | 1.4579 | 35.894 | 30.4962 | 0 | | 0.6934 | 22.8595 | 109000 | 1.4564 | 35.3773 | 30.5228 | 0 | | 0.6728 | 23.0693 | 110000 | 1.4605 | 35.4285 | 30.4197 | 0 | | 0.6749 | 23.2790 | 111000 | 1.4574 | 35.6055 | 30.5698 | 0 | | 0.6767 | 23.4887 | 112000 | 1.4553 | 35.719 | 30.5378 | 0 | | 0.6833 | 23.6984 | 113000 | 1.4630 | 35.4939 | 30.4862 | 0 | | 0.6793 | 23.9081 | 114000 | 1.4526 | 35.7737 | 30.4832 | 0 | | 0.6694 | 24.1179 | 115000 | 1.4619 | 35.6107 | 30.4972 | 0 | | 0.6706 | 24.3276 | 116000 | 1.4659 | 35.4693 | 30.4822 | 0 | | 0.6786 | 24.5373 | 117000 | 1.4604 | 35.7295 | 30.5323 | 0 | | 0.68 | 24.7470 | 118000 | 1.4612 | 35.5735 | 30.4437 | 0 | | 0.6797 | 24.9567 | 119000 | 1.4531 | 35.668 | 30.5263 | 0 | ### Framework versions - Transformers 4.44.0 - Pytorch 2.4.0+cu121 - Datasets 2.21.0 - Tokenizers 0.19.1