Edit model card

switch-base-32-samsum

This model is a fine-tuned version of google/switch-base-32 on the samsum dataset. It achieves the following results on the evaluation set:

  • Loss: 1.3830
  • Rouge1: 48.5521
  • Rouge2: 25.5283
  • Rougel: 40.8665
  • Rougelsum: 44.9575
  • Gen Len: 16.9144

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-05
  • train_batch_size: 16
  • eval_batch_size: 16
  • seed: 42
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: linear
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 5.0

Training results

Training Loss Epoch Step Validation Loss Rouge1 Rouge2 Rougel Rougelsum Gen Len
13.4215 0.1086 100 11.0472 20.1278 5.2521 17.5417 19.1564 18.5807
2.8392 0.2172 200 2.1007 38.3594 16.281 32.2365 35.4802 16.5599
2.4215 0.3257 300 1.7960 42.1238 19.355 35.3645 39.2556 16.2677
2.122 0.4343 400 1.6754 43.7744 20.6979 36.6416 40.6431 17.3716
2.0046 0.5429 500 1.5964 44.1887 21.1957 36.8047 40.8905 16.7249
1.9988 0.6515 600 1.5513 45.6737 21.9662 38.0672 42.3237 17.0293
1.868 0.7600 700 1.5133 45.549 21.791 37.9979 42.1384 16.2249
1.7934 0.8686 800 1.4904 45.6877 22.6099 38.4701 42.2678 16.2335
1.8638 0.9772 900 1.4783 46.2036 23.2629 39.2818 43.0232 16.2555
1.6739 1.0858 1000 1.4597 46.4896 23.2284 39.6004 43.1073 16.2335
1.6511 1.1944 1100 1.4717 46.3555 23.2062 39.0139 43.0476 17.0636
1.7472 1.3029 1200 1.4456 46.8039 23.0325 39.3688 43.267 16.9169
1.6646 1.4115 1300 1.4474 46.9795 23.8693 40.0189 43.5672 16.4095
1.7575 1.5201 1400 1.4313 47.0233 23.2824 39.4242 43.4246 17.1039
1.6169 1.6287 1500 1.4282 47.2462 23.6695 39.6043 43.575 16.6883
1.6276 1.7372 1600 1.4179 47.5435 24.1485 40.2526 44.2173 16.3386
1.5724 1.8458 1700 1.4148 47.709 24.1513 40.3054 44.3152 16.8716
1.6417 1.9544 1800 1.4070 47.711 24.3763 40.4776 44.1524 17.099
1.4839 2.0630 1900 1.4223 47.6921 24.5385 40.5104 44.2406 16.4535
1.4515 2.1716 2000 1.4060 48.0411 24.8227 40.9466 44.5028 16.6675
1.4827 2.2801 2100 1.4066 47.7 24.3622 40.2299 44.1456 17.0183
1.4776 2.3887 2200 1.4066 47.9768 24.7871 40.7986 44.5597 16.8178
1.4776 2.4973 2300 1.4017 47.9306 24.6758 40.4826 44.4696 17.2176
1.5189 2.6059 2400 1.4000 47.422 24.3336 40.0832 43.9033 16.5281
1.5369 2.7144 2500 1.3910 47.9702 24.7618 40.5049 44.4661 16.9046
1.4754 2.8230 2600 1.3915 48.0885 25.0111 41.0073 44.5462 16.3215
1.4609 2.9316 2700 1.3796 48.2953 25.1084 40.8045 44.8141 16.6883
1.2852 3.0402 2800 1.3914 48.1816 24.9564 40.4874 44.4959 16.6809
1.3426 3.1488 2900 1.3925 47.9864 25.1931 40.6587 44.3335 16.7457
1.342 3.2573 3000 1.3907 47.9714 25.0598 40.7272 44.4796 16.6663
1.3408 3.3659 3100 1.3876 47.9041 24.8444 40.4734 44.1852 17.0917
1.3964 3.4745 3200 1.3831 48.244 25.3169 40.7608 44.6435 16.846
1.2923 3.5831 3300 1.3872 48.1798 25.031 40.7752 44.7031 17.1149
1.3557 3.6916 3400 1.3797 48.4681 25.1391 40.7846 44.9196 16.8924
1.3749 3.8002 3500 1.3799 48.2949 25.3223 40.6975 44.7215 17.1785
1.3232 3.9088 3600 1.3761 48.2852 25.0934 40.7396 44.6782 16.8643
1.2519 4.0174 3700 1.3756 47.8744 24.8648 40.4524 44.4635 16.8631
1.1997 4.1260 3800 1.3859 48.6158 25.5093 41.1598 45.2168 16.9132
1.2544 4.2345 3900 1.3837 48.492 25.1007 40.7921 44.8931 17.0538
1.2808 4.3431 4000 1.3825 48.5394 25.5808 40.9153 44.9679 16.912
1.2971 4.4517 4100 1.3844 48.5203 25.4213 41.0222 45.0464 16.923
1.2563 4.5603 4200 1.3842 48.5428 25.7257 41.2674 45.0936 16.7531
1.2324 4.6688 4300 1.3828 48.6838 25.797 41.216 45.1151 16.8362
1.3399 4.7774 4400 1.3831 48.5336 25.5641 40.8484 44.9315 16.9523
1.3147 4.8860 4500 1.3823 48.5021 25.4093 40.8773 44.8717 16.8851
1.2837 4.9946 4600 1.3830 48.5521 25.5283 40.8665 44.9575 16.9144

Framework versions

  • Transformers 4.41.2
  • Pytorch 2.2.0
  • Datasets 2.14.5
  • Tokenizers 0.19.1
Downloads last month
3
Safetensors
Model size
1.98B params
Tensor type
F32
·
Inference API
This model can be loaded on Inference API (serverless).

Finetuned from

Dataset used to train taehyunzzz/switch-base-32-samsum

Evaluation results