2023-10-14 00:40:53,759 ---------------------------------------------------------------------------------------------------- 2023-10-14 00:40:53,762 Model: "SequenceTagger( (embeddings): ByT5Embeddings( (model): T5EncoderModel( (shared): Embedding(384, 1472) (encoder): T5Stack( (embed_tokens): Embedding(384, 1472) (block): ModuleList( (0): T5Block( (layer): ModuleList( (0): T5LayerSelfAttention( (SelfAttention): T5Attention( (q): Linear(in_features=1472, out_features=384, bias=False) (k): Linear(in_features=1472, out_features=384, bias=False) (v): Linear(in_features=1472, out_features=384, bias=False) (o): Linear(in_features=384, out_features=1472, bias=False) (relative_attention_bias): Embedding(32, 6) ) (layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) (1): T5LayerFF( (DenseReluDense): T5DenseGatedActDense( (wi_0): Linear(in_features=1472, out_features=3584, bias=False) (wi_1): Linear(in_features=1472, out_features=3584, bias=False) (wo): Linear(in_features=3584, out_features=1472, bias=False) (dropout): Dropout(p=0.1, inplace=False) (act): NewGELUActivation() ) (layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) ) ) (1-11): 11 x T5Block( (layer): ModuleList( (0): T5LayerSelfAttention( (SelfAttention): T5Attention( (q): Linear(in_features=1472, out_features=384, bias=False) (k): Linear(in_features=1472, out_features=384, bias=False) (v): Linear(in_features=1472, out_features=384, bias=False) (o): Linear(in_features=384, out_features=1472, bias=False) ) (layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) (1): T5LayerFF( (DenseReluDense): T5DenseGatedActDense( (wi_0): Linear(in_features=1472, out_features=3584, bias=False) (wi_1): Linear(in_features=1472, out_features=3584, bias=False) (wo): Linear(in_features=3584, out_features=1472, bias=False) (dropout): Dropout(p=0.1, inplace=False) (act): NewGELUActivation() ) (layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) ) ) ) (final_layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) ) ) (locked_dropout): LockedDropout(p=0.5) (linear): Linear(in_features=1472, out_features=13, bias=True) (loss_function): CrossEntropyLoss() )" 2023-10-14 00:40:53,762 ---------------------------------------------------------------------------------------------------- 2023-10-14 00:40:53,762 MultiCorpus: 6183 train + 680 dev + 2113 test sentences - NER_HIPE_2022 Corpus: 6183 train + 680 dev + 2113 test sentences - /root/.flair/datasets/ner_hipe_2022/v2.1/topres19th/en/with_doc_seperator 2023-10-14 00:40:53,762 ---------------------------------------------------------------------------------------------------- 2023-10-14 00:40:53,763 Train: 6183 sentences 2023-10-14 00:40:53,763 (train_with_dev=False, train_with_test=False) 2023-10-14 00:40:53,763 ---------------------------------------------------------------------------------------------------- 2023-10-14 00:40:53,763 Training Params: 2023-10-14 00:40:53,763 - learning_rate: "0.00015" 2023-10-14 00:40:53,763 - mini_batch_size: "4" 2023-10-14 00:40:53,763 - max_epochs: "10" 2023-10-14 00:40:53,763 - shuffle: "True" 2023-10-14 00:40:53,763 ---------------------------------------------------------------------------------------------------- 2023-10-14 00:40:53,763 Plugins: 2023-10-14 00:40:53,763 - TensorboardLogger 2023-10-14 00:40:53,763 - LinearScheduler | warmup_fraction: '0.1' 2023-10-14 00:40:53,763 ---------------------------------------------------------------------------------------------------- 2023-10-14 00:40:53,764 Final evaluation on model from best epoch (best-model.pt) 2023-10-14 00:40:53,764 - metric: "('micro avg', 'f1-score')" 2023-10-14 00:40:53,764 ---------------------------------------------------------------------------------------------------- 2023-10-14 00:40:53,764 Computation: 2023-10-14 00:40:53,764 - compute on device: cuda:0 2023-10-14 00:40:53,764 - embedding storage: none 2023-10-14 00:40:53,764 ---------------------------------------------------------------------------------------------------- 2023-10-14 00:40:53,764 Model training base path: "hmbench-topres19th/en-hmbyt5-preliminary/byt5-small-historic-multilingual-span20-flax-bs4-wsFalse-e10-lr0.00015-poolingfirst-layers-1-crfFalse-4" 2023-10-14 00:40:53,764 ---------------------------------------------------------------------------------------------------- 2023-10-14 00:40:53,764 ---------------------------------------------------------------------------------------------------- 2023-10-14 00:40:53,764 Logging anything other than scalars to TensorBoard is currently not supported. 2023-10-14 00:41:37,315 epoch 1 - iter 154/1546 - loss 2.56873151 - time (sec): 43.55 - samples/sec: 288.51 - lr: 0.000015 - momentum: 0.000000 2023-10-14 00:42:21,594 epoch 1 - iter 308/1546 - loss 2.43379623 - time (sec): 87.83 - samples/sec: 291.12 - lr: 0.000030 - momentum: 0.000000 2023-10-14 00:43:04,591 epoch 1 - iter 462/1546 - loss 2.16358210 - time (sec): 130.82 - samples/sec: 293.17 - lr: 0.000045 - momentum: 0.000000 2023-10-14 00:43:47,729 epoch 1 - iter 616/1546 - loss 1.89703690 - time (sec): 173.96 - samples/sec: 286.04 - lr: 0.000060 - momentum: 0.000000 2023-10-14 00:44:31,467 epoch 1 - iter 770/1546 - loss 1.61404026 - time (sec): 217.70 - samples/sec: 285.41 - lr: 0.000075 - momentum: 0.000000 2023-10-14 00:45:15,241 epoch 1 - iter 924/1546 - loss 1.39110007 - time (sec): 261.47 - samples/sec: 282.67 - lr: 0.000090 - momentum: 0.000000 2023-10-14 00:45:59,472 epoch 1 - iter 1078/1546 - loss 1.22539657 - time (sec): 305.71 - samples/sec: 282.34 - lr: 0.000104 - momentum: 0.000000 2023-10-14 00:46:43,473 epoch 1 - iter 1232/1546 - loss 1.09447369 - time (sec): 349.71 - samples/sec: 282.98 - lr: 0.000119 - momentum: 0.000000 2023-10-14 00:47:26,775 epoch 1 - iter 1386/1546 - loss 0.98722789 - time (sec): 393.01 - samples/sec: 284.48 - lr: 0.000134 - momentum: 0.000000 2023-10-14 00:48:09,530 epoch 1 - iter 1540/1546 - loss 0.90394722 - time (sec): 435.76 - samples/sec: 284.17 - lr: 0.000149 - momentum: 0.000000 2023-10-14 00:48:11,100 ---------------------------------------------------------------------------------------------------- 2023-10-14 00:48:11,100 EPOCH 1 done: loss 0.9011 - lr: 0.000149 2023-10-14 00:48:28,371 DEV : loss 0.0812714695930481 - f1-score (micro avg) 0.5821 2023-10-14 00:48:28,400 saving best model 2023-10-14 00:48:29,339 ---------------------------------------------------------------------------------------------------- 2023-10-14 00:49:11,816 epoch 2 - iter 154/1546 - loss 0.10318258 - time (sec): 42.47 - samples/sec: 258.58 - lr: 0.000148 - momentum: 0.000000 2023-10-14 00:49:55,140 epoch 2 - iter 308/1546 - loss 0.10981228 - time (sec): 85.80 - samples/sec: 280.28 - lr: 0.000147 - momentum: 0.000000 2023-10-14 00:50:38,438 epoch 2 - iter 462/1546 - loss 0.10677416 - time (sec): 129.10 - samples/sec: 279.42 - lr: 0.000145 - momentum: 0.000000 2023-10-14 00:51:21,839 epoch 2 - iter 616/1546 - loss 0.10494738 - time (sec): 172.50 - samples/sec: 283.09 - lr: 0.000143 - momentum: 0.000000 2023-10-14 00:52:05,915 epoch 2 - iter 770/1546 - loss 0.10120027 - time (sec): 216.57 - samples/sec: 285.19 - lr: 0.000142 - momentum: 0.000000 2023-10-14 00:52:49,780 epoch 2 - iter 924/1546 - loss 0.09745622 - time (sec): 260.44 - samples/sec: 287.52 - lr: 0.000140 - momentum: 0.000000 2023-10-14 00:53:33,706 epoch 2 - iter 1078/1546 - loss 0.09614656 - time (sec): 304.36 - samples/sec: 285.46 - lr: 0.000138 - momentum: 0.000000 2023-10-14 00:54:16,839 epoch 2 - iter 1232/1546 - loss 0.09374842 - time (sec): 347.50 - samples/sec: 285.24 - lr: 0.000137 - momentum: 0.000000 2023-10-14 00:54:59,232 epoch 2 - iter 1386/1546 - loss 0.09264200 - time (sec): 389.89 - samples/sec: 284.46 - lr: 0.000135 - momentum: 0.000000 2023-10-14 00:55:42,457 epoch 2 - iter 1540/1546 - loss 0.09188658 - time (sec): 433.12 - samples/sec: 285.66 - lr: 0.000133 - momentum: 0.000000 2023-10-14 00:55:44,125 ---------------------------------------------------------------------------------------------------- 2023-10-14 00:55:44,125 EPOCH 2 done: loss 0.0918 - lr: 0.000133 2023-10-14 00:56:01,105 DEV : loss 0.05896108224987984 - f1-score (micro avg) 0.753 2023-10-14 00:56:01,138 saving best model 2023-10-14 00:56:02,107 ---------------------------------------------------------------------------------------------------- 2023-10-14 00:56:45,802 epoch 3 - iter 154/1546 - loss 0.03817025 - time (sec): 43.69 - samples/sec: 290.81 - lr: 0.000132 - momentum: 0.000000 2023-10-14 00:57:30,409 epoch 3 - iter 308/1546 - loss 0.04755001 - time (sec): 88.30 - samples/sec: 285.37 - lr: 0.000130 - momentum: 0.000000 2023-10-14 00:58:14,419 epoch 3 - iter 462/1546 - loss 0.05104447 - time (sec): 132.31 - samples/sec: 287.17 - lr: 0.000128 - momentum: 0.000000 2023-10-14 00:58:59,278 epoch 3 - iter 616/1546 - loss 0.05222974 - time (sec): 177.17 - samples/sec: 284.35 - lr: 0.000127 - momentum: 0.000000 2023-10-14 00:59:42,378 epoch 3 - iter 770/1546 - loss 0.05664685 - time (sec): 220.27 - samples/sec: 283.57 - lr: 0.000125 - momentum: 0.000000 2023-10-14 01:00:26,466 epoch 3 - iter 924/1546 - loss 0.05615839 - time (sec): 264.36 - samples/sec: 283.26 - lr: 0.000123 - momentum: 0.000000 2023-10-14 01:01:10,213 epoch 3 - iter 1078/1546 - loss 0.05563682 - time (sec): 308.10 - samples/sec: 283.56 - lr: 0.000122 - momentum: 0.000000 2023-10-14 01:01:53,935 epoch 3 - iter 1232/1546 - loss 0.05496907 - time (sec): 351.83 - samples/sec: 283.05 - lr: 0.000120 - momentum: 0.000000 2023-10-14 01:02:36,567 epoch 3 - iter 1386/1546 - loss 0.05420399 - time (sec): 394.46 - samples/sec: 282.68 - lr: 0.000118 - momentum: 0.000000 2023-10-14 01:03:19,488 epoch 3 - iter 1540/1546 - loss 0.05358357 - time (sec): 437.38 - samples/sec: 283.24 - lr: 0.000117 - momentum: 0.000000 2023-10-14 01:03:21,125 ---------------------------------------------------------------------------------------------------- 2023-10-14 01:03:21,126 EPOCH 3 done: loss 0.0535 - lr: 0.000117 2023-10-14 01:03:38,668 DEV : loss 0.05659706890583038 - f1-score (micro avg) 0.8127 2023-10-14 01:03:38,696 saving best model 2023-10-14 01:03:39,676 ---------------------------------------------------------------------------------------------------- 2023-10-14 01:04:22,928 epoch 4 - iter 154/1546 - loss 0.02880123 - time (sec): 43.25 - samples/sec: 282.77 - lr: 0.000115 - momentum: 0.000000 2023-10-14 01:05:05,591 epoch 4 - iter 308/1546 - loss 0.03320371 - time (sec): 85.91 - samples/sec: 281.37 - lr: 0.000113 - momentum: 0.000000 2023-10-14 01:05:47,828 epoch 4 - iter 462/1546 - loss 0.03355342 - time (sec): 128.15 - samples/sec: 277.43 - lr: 0.000112 - momentum: 0.000000 2023-10-14 01:06:32,557 epoch 4 - iter 616/1546 - loss 0.03292566 - time (sec): 172.88 - samples/sec: 281.16 - lr: 0.000110 - momentum: 0.000000 2023-10-14 01:07:16,430 epoch 4 - iter 770/1546 - loss 0.03538356 - time (sec): 216.75 - samples/sec: 283.12 - lr: 0.000108 - momentum: 0.000000 2023-10-14 01:08:00,152 epoch 4 - iter 924/1546 - loss 0.03499173 - time (sec): 260.47 - samples/sec: 282.29 - lr: 0.000107 - momentum: 0.000000 2023-10-14 01:08:43,630 epoch 4 - iter 1078/1546 - loss 0.03358414 - time (sec): 303.95 - samples/sec: 282.46 - lr: 0.000105 - momentum: 0.000000 2023-10-14 01:09:26,021 epoch 4 - iter 1232/1546 - loss 0.03409530 - time (sec): 346.34 - samples/sec: 282.45 - lr: 0.000103 - momentum: 0.000000 2023-10-14 01:10:10,598 epoch 4 - iter 1386/1546 - loss 0.03269230 - time (sec): 390.92 - samples/sec: 284.77 - lr: 0.000102 - momentum: 0.000000 2023-10-14 01:10:54,547 epoch 4 - iter 1540/1546 - loss 0.03249523 - time (sec): 434.87 - samples/sec: 284.58 - lr: 0.000100 - momentum: 0.000000 2023-10-14 01:10:56,205 ---------------------------------------------------------------------------------------------------- 2023-10-14 01:10:56,206 EPOCH 4 done: loss 0.0326 - lr: 0.000100 2023-10-14 01:11:14,223 DEV : loss 0.06543166935443878 - f1-score (micro avg) 0.8296 2023-10-14 01:11:14,257 saving best model 2023-10-14 01:11:16,978 ---------------------------------------------------------------------------------------------------- 2023-10-14 01:12:01,955 epoch 5 - iter 154/1546 - loss 0.02269561 - time (sec): 44.97 - samples/sec: 277.46 - lr: 0.000098 - momentum: 0.000000 2023-10-14 01:12:45,258 epoch 5 - iter 308/1546 - loss 0.01877881 - time (sec): 88.28 - samples/sec: 282.06 - lr: 0.000097 - momentum: 0.000000 2023-10-14 01:13:28,365 epoch 5 - iter 462/1546 - loss 0.01896534 - time (sec): 131.38 - samples/sec: 284.70 - lr: 0.000095 - momentum: 0.000000 2023-10-14 01:14:12,356 epoch 5 - iter 616/1546 - loss 0.01903863 - time (sec): 175.37 - samples/sec: 282.49 - lr: 0.000093 - momentum: 0.000000 2023-10-14 01:14:55,863 epoch 5 - iter 770/1546 - loss 0.01875930 - time (sec): 218.88 - samples/sec: 285.22 - lr: 0.000092 - momentum: 0.000000 2023-10-14 01:15:39,889 epoch 5 - iter 924/1546 - loss 0.01806369 - time (sec): 262.91 - samples/sec: 285.58 - lr: 0.000090 - momentum: 0.000000 2023-10-14 01:16:24,044 epoch 5 - iter 1078/1546 - loss 0.01918173 - time (sec): 307.06 - samples/sec: 285.19 - lr: 0.000088 - momentum: 0.000000 2023-10-14 01:17:07,380 epoch 5 - iter 1232/1546 - loss 0.01910152 - time (sec): 350.40 - samples/sec: 282.90 - lr: 0.000087 - momentum: 0.000000 2023-10-14 01:17:50,259 epoch 5 - iter 1386/1546 - loss 0.02046787 - time (sec): 393.28 - samples/sec: 283.93 - lr: 0.000085 - momentum: 0.000000 2023-10-14 01:18:34,363 epoch 5 - iter 1540/1546 - loss 0.02101615 - time (sec): 437.38 - samples/sec: 282.89 - lr: 0.000083 - momentum: 0.000000 2023-10-14 01:18:36,049 ---------------------------------------------------------------------------------------------------- 2023-10-14 01:18:36,049 EPOCH 5 done: loss 0.0210 - lr: 0.000083 2023-10-14 01:18:53,025 DEV : loss 0.07295508682727814 - f1-score (micro avg) 0.8114 2023-10-14 01:18:53,055 ---------------------------------------------------------------------------------------------------- 2023-10-14 01:19:37,032 epoch 6 - iter 154/1546 - loss 0.01805609 - time (sec): 43.98 - samples/sec: 280.23 - lr: 0.000082 - momentum: 0.000000 2023-10-14 01:20:20,915 epoch 6 - iter 308/1546 - loss 0.01358401 - time (sec): 87.86 - samples/sec: 282.61 - lr: 0.000080 - momentum: 0.000000 2023-10-14 01:21:04,910 epoch 6 - iter 462/1546 - loss 0.01234981 - time (sec): 131.85 - samples/sec: 286.36 - lr: 0.000078 - momentum: 0.000000 2023-10-14 01:21:48,433 epoch 6 - iter 616/1546 - loss 0.01241203 - time (sec): 175.38 - samples/sec: 286.78 - lr: 0.000077 - momentum: 0.000000 2023-10-14 01:22:31,732 epoch 6 - iter 770/1546 - loss 0.01401568 - time (sec): 218.67 - samples/sec: 283.93 - lr: 0.000075 - momentum: 0.000000 2023-10-14 01:23:15,331 epoch 6 - iter 924/1546 - loss 0.01424007 - time (sec): 262.27 - samples/sec: 282.40 - lr: 0.000073 - momentum: 0.000000 2023-10-14 01:23:59,294 epoch 6 - iter 1078/1546 - loss 0.01451504 - time (sec): 306.24 - samples/sec: 282.27 - lr: 0.000072 - momentum: 0.000000 2023-10-14 01:24:42,465 epoch 6 - iter 1232/1546 - loss 0.01474960 - time (sec): 349.41 - samples/sec: 281.30 - lr: 0.000070 - momentum: 0.000000 2023-10-14 01:25:25,815 epoch 6 - iter 1386/1546 - loss 0.01493525 - time (sec): 392.76 - samples/sec: 281.51 - lr: 0.000068 - momentum: 0.000000 2023-10-14 01:26:09,605 epoch 6 - iter 1540/1546 - loss 0.01413867 - time (sec): 436.55 - samples/sec: 283.39 - lr: 0.000067 - momentum: 0.000000 2023-10-14 01:26:11,277 ---------------------------------------------------------------------------------------------------- 2023-10-14 01:26:11,277 EPOCH 6 done: loss 0.0143 - lr: 0.000067 2023-10-14 01:26:29,293 DEV : loss 0.07670143991708755 - f1-score (micro avg) 0.831 2023-10-14 01:26:29,335 saving best model 2023-10-14 01:26:31,945 ---------------------------------------------------------------------------------------------------- 2023-10-14 01:27:18,143 epoch 7 - iter 154/1546 - loss 0.00975317 - time (sec): 46.19 - samples/sec: 297.57 - lr: 0.000065 - momentum: 0.000000 2023-10-14 01:28:00,843 epoch 7 - iter 308/1546 - loss 0.00891956 - time (sec): 88.89 - samples/sec: 292.11 - lr: 0.000063 - momentum: 0.000000 2023-10-14 01:28:44,318 epoch 7 - iter 462/1546 - loss 0.00997728 - time (sec): 132.37 - samples/sec: 290.55 - lr: 0.000062 - momentum: 0.000000 2023-10-14 01:29:26,892 epoch 7 - iter 616/1546 - loss 0.00930809 - time (sec): 174.94 - samples/sec: 290.48 - lr: 0.000060 - momentum: 0.000000 2023-10-14 01:30:09,203 epoch 7 - iter 770/1546 - loss 0.00916108 - time (sec): 217.25 - samples/sec: 287.44 - lr: 0.000058 - momentum: 0.000000 2023-10-14 01:30:51,937 epoch 7 - iter 924/1546 - loss 0.00924439 - time (sec): 259.99 - samples/sec: 290.46 - lr: 0.000057 - momentum: 0.000000 2023-10-14 01:31:33,959 epoch 7 - iter 1078/1546 - loss 0.01049403 - time (sec): 302.01 - samples/sec: 290.64 - lr: 0.000055 - momentum: 0.000000 2023-10-14 01:32:16,483 epoch 7 - iter 1232/1546 - loss 0.01043974 - time (sec): 344.53 - samples/sec: 288.74 - lr: 0.000053 - momentum: 0.000000 2023-10-14 01:32:59,452 epoch 7 - iter 1386/1546 - loss 0.00997770 - time (sec): 387.50 - samples/sec: 287.00 - lr: 0.000052 - momentum: 0.000000 2023-10-14 01:33:42,672 epoch 7 - iter 1540/1546 - loss 0.00960048 - time (sec): 430.72 - samples/sec: 287.24 - lr: 0.000050 - momentum: 0.000000 2023-10-14 01:33:44,315 ---------------------------------------------------------------------------------------------------- 2023-10-14 01:33:44,316 EPOCH 7 done: loss 0.0096 - lr: 0.000050 2023-10-14 01:34:01,530 DEV : loss 0.0880291685461998 - f1-score (micro avg) 0.8364 2023-10-14 01:34:01,560 saving best model 2023-10-14 01:34:04,174 ---------------------------------------------------------------------------------------------------- 2023-10-14 01:34:47,366 epoch 8 - iter 154/1546 - loss 0.00390031 - time (sec): 43.19 - samples/sec: 293.43 - lr: 0.000048 - momentum: 0.000000 2023-10-14 01:35:29,570 epoch 8 - iter 308/1546 - loss 0.00354778 - time (sec): 85.39 - samples/sec: 283.34 - lr: 0.000047 - momentum: 0.000000 2023-10-14 01:36:12,267 epoch 8 - iter 462/1546 - loss 0.00558009 - time (sec): 128.09 - samples/sec: 285.33 - lr: 0.000045 - momentum: 0.000000 2023-10-14 01:36:56,668 epoch 8 - iter 616/1546 - loss 0.00613750 - time (sec): 172.49 - samples/sec: 280.88 - lr: 0.000043 - momentum: 0.000000 2023-10-14 01:37:43,215 epoch 8 - iter 770/1546 - loss 0.00600191 - time (sec): 219.04 - samples/sec: 278.43 - lr: 0.000042 - momentum: 0.000000 2023-10-14 01:38:29,251 epoch 8 - iter 924/1546 - loss 0.00583790 - time (sec): 265.07 - samples/sec: 277.56 - lr: 0.000040 - momentum: 0.000000 2023-10-14 01:39:14,114 epoch 8 - iter 1078/1546 - loss 0.00603055 - time (sec): 309.94 - samples/sec: 274.06 - lr: 0.000038 - momentum: 0.000000 2023-10-14 01:40:01,383 epoch 8 - iter 1232/1546 - loss 0.00653849 - time (sec): 357.21 - samples/sec: 274.08 - lr: 0.000037 - momentum: 0.000000 2023-10-14 01:40:49,090 epoch 8 - iter 1386/1546 - loss 0.00610073 - time (sec): 404.91 - samples/sec: 274.75 - lr: 0.000035 - momentum: 0.000000 2023-10-14 01:41:35,518 epoch 8 - iter 1540/1546 - loss 0.00595387 - time (sec): 451.34 - samples/sec: 274.45 - lr: 0.000033 - momentum: 0.000000 2023-10-14 01:41:37,155 ---------------------------------------------------------------------------------------------------- 2023-10-14 01:41:37,155 EPOCH 8 done: loss 0.0059 - lr: 0.000033 2023-10-14 01:41:54,940 DEV : loss 0.09291724860668182 - f1-score (micro avg) 0.832 2023-10-14 01:41:54,969 ---------------------------------------------------------------------------------------------------- 2023-10-14 01:42:39,414 epoch 9 - iter 154/1546 - loss 0.00195984 - time (sec): 44.44 - samples/sec: 283.42 - lr: 0.000032 - momentum: 0.000000 2023-10-14 01:43:23,371 epoch 9 - iter 308/1546 - loss 0.00226573 - time (sec): 88.40 - samples/sec: 280.80 - lr: 0.000030 - momentum: 0.000000 2023-10-14 01:44:07,873 epoch 9 - iter 462/1546 - loss 0.00286662 - time (sec): 132.90 - samples/sec: 284.35 - lr: 0.000028 - momentum: 0.000000 2023-10-14 01:44:52,315 epoch 9 - iter 616/1546 - loss 0.00407165 - time (sec): 177.34 - samples/sec: 284.00 - lr: 0.000027 - momentum: 0.000000 2023-10-14 01:45:35,291 epoch 9 - iter 770/1546 - loss 0.00463367 - time (sec): 220.32 - samples/sec: 282.55 - lr: 0.000025 - momentum: 0.000000 2023-10-14 01:46:18,120 epoch 9 - iter 924/1546 - loss 0.00464126 - time (sec): 263.15 - samples/sec: 283.13 - lr: 0.000023 - momentum: 0.000000 2023-10-14 01:46:58,607 epoch 9 - iter 1078/1546 - loss 0.00453876 - time (sec): 303.64 - samples/sec: 288.14 - lr: 0.000022 - momentum: 0.000000 2023-10-14 01:47:38,679 epoch 9 - iter 1232/1546 - loss 0.00471355 - time (sec): 343.71 - samples/sec: 290.15 - lr: 0.000020 - momentum: 0.000000 2023-10-14 01:48:19,657 epoch 9 - iter 1386/1546 - loss 0.00446859 - time (sec): 384.69 - samples/sec: 292.58 - lr: 0.000018 - momentum: 0.000000 2023-10-14 01:49:01,929 epoch 9 - iter 1540/1546 - loss 0.00448726 - time (sec): 426.96 - samples/sec: 289.78 - lr: 0.000017 - momentum: 0.000000 2023-10-14 01:49:03,704 ---------------------------------------------------------------------------------------------------- 2023-10-14 01:49:03,705 EPOCH 9 done: loss 0.0045 - lr: 0.000017 2023-10-14 01:49:20,835 DEV : loss 0.10131476074457169 - f1-score (micro avg) 0.8276 2023-10-14 01:49:20,865 ---------------------------------------------------------------------------------------------------- 2023-10-14 01:50:04,766 epoch 10 - iter 154/1546 - loss 0.00119822 - time (sec): 43.90 - samples/sec: 288.50 - lr: 0.000015 - momentum: 0.000000 2023-10-14 01:50:47,340 epoch 10 - iter 308/1546 - loss 0.00178207 - time (sec): 86.47 - samples/sec: 277.32 - lr: 0.000013 - momentum: 0.000000 2023-10-14 01:51:30,796 epoch 10 - iter 462/1546 - loss 0.00153142 - time (sec): 129.93 - samples/sec: 283.13 - lr: 0.000012 - momentum: 0.000000 2023-10-14 01:52:14,216 epoch 10 - iter 616/1546 - loss 0.00162289 - time (sec): 173.35 - samples/sec: 286.02 - lr: 0.000010 - momentum: 0.000000 2023-10-14 01:52:57,272 epoch 10 - iter 770/1546 - loss 0.00187635 - time (sec): 216.40 - samples/sec: 285.41 - lr: 0.000008 - momentum: 0.000000 2023-10-14 01:53:41,326 epoch 10 - iter 924/1546 - loss 0.00182691 - time (sec): 260.46 - samples/sec: 284.76 - lr: 0.000007 - momentum: 0.000000 2023-10-14 01:54:24,005 epoch 10 - iter 1078/1546 - loss 0.00196491 - time (sec): 303.14 - samples/sec: 284.52 - lr: 0.000005 - momentum: 0.000000 2023-10-14 01:55:07,582 epoch 10 - iter 1232/1546 - loss 0.00209036 - time (sec): 346.71 - samples/sec: 285.53 - lr: 0.000003 - momentum: 0.000000 2023-10-14 01:55:51,488 epoch 10 - iter 1386/1546 - loss 0.00233145 - time (sec): 390.62 - samples/sec: 283.48 - lr: 0.000002 - momentum: 0.000000 2023-10-14 01:56:35,504 epoch 10 - iter 1540/1546 - loss 0.00253840 - time (sec): 434.64 - samples/sec: 284.86 - lr: 0.000000 - momentum: 0.000000 2023-10-14 01:56:37,127 ---------------------------------------------------------------------------------------------------- 2023-10-14 01:56:37,127 EPOCH 10 done: loss 0.0025 - lr: 0.000000 2023-10-14 01:56:55,037 DEV : loss 0.10494109988212585 - f1-score (micro avg) 0.8259 2023-10-14 01:56:55,989 ---------------------------------------------------------------------------------------------------- 2023-10-14 01:56:55,991 Loading model from best epoch ... 2023-10-14 01:57:00,432 SequenceTagger predicts: Dictionary with 13 tags: O, S-LOC, B-LOC, E-LOC, I-LOC, S-BUILDING, B-BUILDING, E-BUILDING, I-BUILDING, S-STREET, B-STREET, E-STREET, I-STREET 2023-10-14 01:57:55,195 Results: - F-score (micro) 0.7978 - F-score (macro) 0.713 - Accuracy 0.6828 By class: precision recall f1-score support LOC 0.8436 0.8552 0.8493 946 BUILDING 0.5588 0.5135 0.5352 185 STREET 0.7414 0.7679 0.7544 56 micro avg 0.7978 0.7978 0.7978 1187 macro avg 0.7146 0.7122 0.7130 1187 weighted avg 0.7944 0.7978 0.7959 1187 2023-10-14 01:57:55,195 ----------------------------------------------------------------------------------------------------