lapp0's picture
End of training
de9012e verified
|
raw
history blame
No virus
3.3 kB
metadata
base_model: gpt2
library_name: Distily
license: mit
tags:
  - generated_from_trainer
model-index:
  - name: distily_bench_gpt2_attn
    results: []

distily_bench_gpt2_attn

This student model is distilled from the teacher model gpt2 using the dataset (unspecified).

The Distily library was used for this distillation.

It achieves the following results on the evaluation set:

  • eval_enwikippl: 228.1461
  • eval_frwikippl: 1416.6694
  • eval_zhwikippl: 848.6490
  • eval_loss: 2.4667
  • eval_runtime: 17.2058
  • eval_samples_per_second: 58.12
  • eval_steps_per_second: 7.265

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • distillation_objective: DistillationObjective(logits_loss_component=LossComponent(label=logits, weight=1, loss_fn=kl, layer_mapper=None, projector=None), hs_loss_component=LossComponent(label=hs, weight=0, loss_fn=None, layer_mapper=None, projector=None), attn_loss_component=LossComponent(label=attn, weight=2.0, loss_fn=cos, layer_mapper=None, projector=None))
  • train_embeddings: True
  • learning_rate: 4e-05
  • train_batch_size: 8
  • eval_batch_size: 8
  • seed: 42
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: constant
  • num_epochs: 1.0

Resource Usage

Peak GPU Memory: 8.2195 GB

Eval-Phase Metrics

step epoch enwikippl frwikippl loss runtime samples_per_second steps_per_second zhwikippl
teacher eval 30.2086 57.2728 18.1784
0 0 56797.875 58468.6992 8.0273 17.152 58.302 7.288 59002.2891
1000 0.0808 797.1624 5157.9775 3.3194 17.2397 58.006 7.251 24401.0566
2000 0.1616 567.0632 3629.1594 3.0871 17.1941 58.16 7.27 3184.9797
3000 0.2424 464.5085 3017.8862 2.9667 17.2095 58.108 7.263 1129.6726
4000 0.3232 401.2574 2690.6233 2.8541 17.2873 57.846 7.231 880.7457
5000 0.4040 348.5625 2427.4329 2.7534 17.2981 57.81 7.226 1079.5291
6000 0.4848 304.7929 2054.1772 2.6701 17.2106 58.104 7.263 904.3437
7000 0.5657 277.6311 1738.0712 2.5931 17.2745 57.889 7.236 861.2068
8000 0.6465 248.1049 1555.2847 2.5229 17.2275 58.047 7.256 875.1184
9000 0.7273 228.1461 1416.6694 2.4667 17.2058 58.12 7.265 848.6490
10000 0.8081 208.8987 1238.1790 2.4113 17.26 57.938 7.242 711.3105
11000 0.8889 194.2086 1232.7786 2.3591 17.2456 57.986 7.248 517.6449
12000 0.9697 175.7651 1108.7455 2.3060 17.3467 57.648 7.206 513.5140
12375 1.0 170.5086 1069.4347 2.2860 17.2133 58.095 7.262 531.0175

Framework versions

  • Distily 0.2.0
  • Transformers 4.44.0
  • Pytorch 2.3.0
  • Datasets 2.20.0