Edit model card

Model Description

This model predicts receptor classes, identified by their PDB IDs, from peptide sequences using the ESM2 (Evolutionary Scale Modeling) protein language model with esm2_t12_35M_UR50D pre-trained weights. The model is fine-tuned for receptor prediction using datasets from PROPEDIA and PepNN, as well as novel peptides experimentally validated to bind to their target proteins, with binding conformations determined using ClusPro, a protein-protein docking tool. The name pep2rec_cppp reflects the model's ability to predict peptide-to-receptor relationships, leveraging training data from ClusPro, PROPEDIA, and PepNN. It's particularly useful for researchers and practitioners in bioinformatics, drug discovery, and related fields, aiming to understand or predict peptide-receptor interactions.

How to Use

Here is how to predict the receptor class for a peptide sequence using this model:

import torch
from transformers import AutoModelForSequenceClassification, AutoTokenizer
from joblib import load

MODEL_PATH = "littleworth/esm2_t12_35M_UR50D_pep2rec_cppp"
model = AutoModelForSequenceClassification.from_pretrained(MODEL_PATH)
tokenizer = AutoTokenizer.from_pretrained(MODEL_PATH)

LABEL_ENCODER_PATH = f"{MODEL_PATH}/label_encoder.joblib"
label_encoder = load(LABEL_ENCODER_PATH)


input_sequence = "GNLIVVGRVIMS"

inputs = tokenizer(input_sequence, return_tensors="pt", truncation=True, padding=True)

with torch.no_grad():
    outputs = model(**inputs)
    probabilities = torch.softmax(outputs.logits, dim=1)
    predicted_class_idx = probabilities.argmax(dim=1).item()

predicted_class = label_encoder.inverse_transform([predicted_class_idx])[0]

class_probabilities = probabilities.squeeze().tolist()
class_labels = label_encoder.inverse_transform(range(len(class_probabilities)))

sorted_indices = torch.argsort(probabilities, descending=True).squeeze()
sorted_class_labels = [class_labels[i] for i in sorted_indices.tolist()]
sorted_class_probabilities = probabilities.squeeze()[sorted_indices].tolist()

print(f"Predicted Receptor Class: {predicted_class}")
print("Top 10 Class Probabilities:")
for label, prob in zip(sorted_class_labels[:10], sorted_class_probabilities[:10]):
    print(f"{label}: {prob:.4f}")

Which gives this output:

Predicted Receptor Class: 1JXP
Top 10 Class Probabilities:
1JXP: 0.9839
3KEE: 0.0001
5EAY: 0.0001
1Z9O: 0.0001
2KBM: 0.0001
2FES: 0.0001
1MWN: 0.0001
5CFC: 0.0001
6O09: 0.0001
1DKD: 0.0001

Evaluation Results

The model was evaluated on a held-out test set, yielding the following metrics:

{
  "train/loss": 0.727,
  "train/grad_norm": 4.4672017097473145,
  "train/learning_rate": 2.3235385792411667e-8,
  "train/epoch": 10,
  "train/global_step": 352910,
  "_timestamp": 1712189024.5060718,
  "_runtime": 503183.0418128967,
  "_step": 716,
  "eval/loss": 0.7138708829879761,
  "eval/accuracy": 0.7794731752930051,
  "eval/runtime": 5914.5446,
  "eval/samples_per_second": 15.912,
  "eval/steps_per_second": 15.912,
  "train/train_runtime": 497231.6027,
  "train/train_samples_per_second": 5.678,
  "train/train_steps_per_second": 0.71,
  "train/total_flos": 600463318555361300,
  "train/train_loss": 0.9245198557043193,
  "_wandb": {
    "runtime": 503182
  }
}
Downloads last month
5
Safetensors
Model size
35.1M params
Tensor type
F32
·
Inference API
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.