Edit model card

Model Description

This model predicts receptor classes, identified by their PDB IDs, from peptide sequences using the ESM2 (Evolutionary Scale Modeling) protein language model with esm2_t6_8M_UR50D pre-trained weights. The model is fine-tuned for receptor prediction using datasets from PROPEDIA and PepNN, as well as novel peptides experimentally validated to bind to their target proteins, with binding conformations determined using ClusPro, a protein-protein docking tool. The name pep2rec_cppp reflects the model's ability to predict peptide-to-receptor relationships, leveraging training data from ClusPro, PROPEDIA, and PepNN. It's particularly useful for researchers and practitioners in bioinformatics, drug discovery, and related fields, aiming to understand or predict peptide-receptor interactions.

How to Use

Here is how to predict the receptor class for a peptide sequence using this model:

import torch
from transformers import AutoModelForSequenceClassification, AutoTokenizer
from joblib import load

MODEL_PATH = "littleworth/esm2_t6_8M_UR50D_pep2rec_cppp"
model = AutoModelForSequenceClassification.from_pretrained(MODEL_PATH)
tokenizer = AutoTokenizer.from_pretrained(MODEL_PATH)

LABEL_ENCODER_PATH = f"{MODEL_PATH}/label_encoder.joblib"
label_encoder = load(LABEL_ENCODER_PATH)


input_sequence = "GNLIVVGRVIMS"

inputs = tokenizer(input_sequence, return_tensors="pt", truncation=True, padding=True)

with torch.no_grad():
    outputs = model(**inputs)
    probabilities = torch.softmax(outputs.logits, dim=1)
    predicted_class_idx = probabilities.argmax(dim=1).item()

predicted_class = label_encoder.inverse_transform([predicted_class_idx])[0]

class_probabilities = probabilities.squeeze().tolist()
class_labels = label_encoder.inverse_transform(range(len(class_probabilities)))

sorted_indices = torch.argsort(probabilities, descending=True).squeeze()
sorted_class_labels = [class_labels[i] for i in sorted_indices.tolist()]
sorted_class_probabilities = probabilities.squeeze()[sorted_indices].tolist()

print(f"Predicted Receptor Class: {predicted_class}")
print("Top 10 Class Probabilities:")
for label, prob in zip(sorted_class_labels[:10], sorted_class_probabilities[:10]):
    print(f"{label}: {prob:.4f}")

Which gives this output:

Predicted Receptor Class: 1JXP
Top 10 Class Probabilities:
1JXP: 0.7793
2OIN: 0.0058
1A1R: 0.0026
2QV1: 0.0025
3KEE: 0.0022
3KF2: 0.0016
5LAS: 0.0016
1QD6: 0.0014
6ME1: 0.0013
2XCF: 0.0013

Evaluation Results

The model was evaluated on a held-out test set, yielding the following metrics:

{
  "train/loss": 0.7338,
  "train/grad_norm": 4.333151340484619,
  "train/learning_rate": 2.3235385792411667e-8,
  "train/epoch": 10,
  "train/global_step": 352910,
  "_timestamp": 1711654529.5562913,
  "_runtime": 204515.04906344414,
  "_step": 715,
  "eval/loss": 0.7718502879142761,
  "eval/accuracy": 0.7761048124023759,
  "eval/runtime": 2734.4878,
  "eval/samples_per_second": 34.416,
  "eval/steps_per_second": 34.416,
  "train/train_runtime": 204505.5285,
  "train/train_samples_per_second": 13.806,
  "train/train_steps_per_second": 1.726,
  "train/total_flos": 143220103846625280,
  "train/train_loss": 1.0842229404661865,
  "_wandb": {
    "runtime": 204514
  }
}
Downloads last month
5
Safetensors
Model size
8.6M params
Tensor type
F32
·
Inference API
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.