monsoon-nlp
/

tinyllama-proteinpretrain-quinoa

Text Generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

Edit model card

tinyllama-proteinpretrain-quinoa

Full model finetuning of TinyLLaMA-1.1B on the "research" split (quinoa protein sequences) of GreenBeing-Proteins dataset.

Notes: pretraining only on sequences leads the model to only generate protein sequences, eventually repeating VVVV ot KKKK.

This model may be replaced with mixed training (bio/chem text and protein).
This model might need "biotokens" to represent the amino acids instead of using the existing tokenizer.

More details TBD

Downloads last month: 1

Safetensors

Model size

1.1B params

Tensor type

F32

·

Finetuned from

Datasets used to train monsoon-nlp/tinyllama-proteinpretrain-quinoa