sileod commited on
Commit
d3640a5
1 Parent(s): ac908c9

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +5 -4
README.md CHANGED
@@ -351,14 +351,15 @@ This model ranked 1st among all models with the microsoft/deberta-v3-base archit
351
  https://ibm.github.io/model-recycling/
352
 
353
  ### Software and training details
 
 
 
 
 
354
  https://github.com/sileod/tasksource/ \
355
  https://github.com/sileod/tasknet/ \
356
  Training code: https://colab.research.google.com/drive/1iB4Oxl9_B5W3ZDzXoWJN-olUbqLBxgQS?usp=sharing
357
 
358
-
359
- This is the shared model with the MNLI classifier on top. Each task had a specific CLS embedding, which is dropped 10% of the time to facilitate model use without it. All multiple-choice model used the same classification layers. For classification tasks, models shared weights if their labels matched.
360
- The number of examples per task was capped to 64k. The model was trained for 200k steps with a batch size of 384, and a peak learning rate of 2e-5. Training took 12 days on Nvidia A30 24GB gpu.
361
-
362
  # Citation
363
 
364
  More details on this [article:](https://arxiv.org/abs/2301.05948)
 
351
  https://ibm.github.io/model-recycling/
352
 
353
  ### Software and training details
354
+
355
+ The model was trained on 600 tasks for 200k steps with a batch size of 384 and a peak learning rate of 2e-5. Training took 12 days on Nvidia A30 24GB gpu.
356
+ This is the shared model with the MNLI classifier on top. Each task had a specific CLS embedding, which is dropped 10% of the time to facilitate model use without it. All multiple-choice model used the same classification layers. For classification tasks, models shared weights if their labels matched.
357
+
358
+
359
  https://github.com/sileod/tasksource/ \
360
  https://github.com/sileod/tasknet/ \
361
  Training code: https://colab.research.google.com/drive/1iB4Oxl9_B5W3ZDzXoWJN-olUbqLBxgQS?usp=sharing
362
 
 
 
 
 
363
  # Citation
364
 
365
  More details on this [article:](https://arxiv.org/abs/2301.05948)