mohammadmahdinouri commited on
Commit
1638c07
1 Parent(s): 4b2429b

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +3 -0
README.md CHANGED
@@ -8,6 +8,9 @@ This model uses a unique distillation method called ‘transformer-layer distill
8
  This model uses 4 hidden layers with a hidden dimension size and an embedding size of 768 resulting in a total of 15M parameters. Due to the small hidden dimension size used in this model, it uses a random initialisation.
9
 
10
  # Citation
 
 
 
11
  ```bibtex
12
  @misc{https://doi.org/10.48550/arxiv.2209.03182,
13
  doi = {10.48550/ARXIV.2209.03182},
 
8
  This model uses 4 hidden layers with a hidden dimension size and an embedding size of 768 resulting in a total of 15M parameters. Due to the small hidden dimension size used in this model, it uses a random initialisation.
9
 
10
  # Citation
11
+
12
+ If you use this model, please consider citing the following paper:
13
+
14
  ```bibtex
15
  @misc{https://doi.org/10.48550/arxiv.2209.03182,
16
  doi = {10.48550/ARXIV.2209.03182},