jonghyunlee
/

DrugLikeMoleculeBERT

Feature Extraction

text-embeddings-inference

Inference Endpoints

Model card Files Files and versions Community

DrugLikeMoleculeBERT / README.md

jonghyunlee's picture

Update README.md

9bebd7f about 2 years ago

|

history blame contribute delete

No virus

1.57 kB

	# Model description
	This model is BERT-based architecture with 8 layers. The detailed config is summarized as follows. The drug-like molecule BERT is inspired by ["Self-Attention Based Molecule Representation for Predicting Drug-Target Interaction"](https://arxiv.org/abs/1908.06760). We modified several points of training procedures.

	```
	config = BertConfig(
	vocab_size=vocab_size,
	hidden_size=128,
	num_hidden_layers=8,
	num_attention_heads=8,
	intermediate_size=512,
	hidden_act="gelu",
	hidden_dropout_prob=0.1,
	attention_probs_dropout_prob=0.1,
	max_position_embeddings=max_seq_len + 2,
	type_vocab_size=1,
	pad_token_id=0,
	position_embedding_type="absolute"
	)
	```

	# Training and evaluation data
	It's trained on drug-like molecules on the PubChem database. The PubChem database contains more than 100 M molecules, therefore, we filtered drug-like molecules using the quality of drug-likeliness score (QED). The 4.1 M molecules were filtered and the QED score threshold was set to 0.7.

	# Tokenizer
	We utilize a character-level tokenizer. The special tokens are "[SOS]", "[EOS]", "[PAD]", "[UNK]".

	# Training hyperparameters
	The following hyperparameters were used during training:
	- Adam optimizer, learning_rate: 5e-4, scheduler: cosine annealing
	- Batch size: 2048
	- Training steps: 24 K
	- Training_precision: FP16
	- Loss function: cross-entropy loss
	- Training masking rate: 30 %
	- Testing masking rate: 15 % (original molecule BERT utilized 15 % of masking rate)
	- NSP task: None

	# Performance
	- Accuracy: 94.02 %