InstaDeepAI
/

nucleotide-transformer-v2-250m-multi-species

Inference Endpoints

Model card Files Files and versions Community

hdallatorre commited on Sep 15, 2023

Commit

8b0e72b

•

1 Parent(s): b813835

feat: Add model card

Files changed (1) hide show

README.md +3 -0

README.md CHANGED Viewed

@@ -92,6 +92,9 @@ The masking procedure used is the standard one for Bert-style training:
 The model was trained with 8 A100 80GB on 300B tokens, with an effective batch size of 1M tokens. The sequence length used was 1000 tokens. The Adam optimizer [38] was used with a learning rate schedule, and standard values for exponential decay rates and epsilon constants, β1 = 0.9, β2 = 0.999 and ε=1e-8. During a first warmup period, the learning rate was increased linearly between 5e-5 and 1e-4 over 16k steps before decreasing following a square root decay until the end of training.
 ### BibTeX entry and citation info

 The model was trained with 8 A100 80GB on 300B tokens, with an effective batch size of 1M tokens. The sequence length used was 1000 tokens. The Adam optimizer [38] was used with a learning rate schedule, and standard values for exponential decay rates and epsilon constants, β1 = 0.9, β2 = 0.999 and ε=1e-8. During a first warmup period, the learning rate was increased linearly between 5e-5 and 1e-4 over 16k steps before decreasing following a square root decay until the end of training.
+### Architecture
+The model belongs to the second generation of nucleotide transformers, with the changes in architecture consisting the use of rotary positional embeddings instead of learned ones, as well as the introduction of Gated Linear Units.
 ### BibTeX entry and citation info