Pytorch NN Classifier fMRI

This repository contains a sophisticated PyTorch-based binary classification model for analyzing fMRI data. The model architecture includes customizable hidden layers, dropout regularization, and various activation functions, with weight initialization using Xavier uniform distribution. The model was optimized using Optuna to tune hyperparameters and achieve optimal performance on fMRI classification tasks.

Model Card

Model Name: Pytorch_Classifier_fMRI
Framework: PyTorch
Task: Binary Classification (e.g., prediction based on fMRI data)
Hyperparameter Optimization: Optuna
Evaluation Metrics: Accuracy, ROC-AUC, Precision, Recall, F1-score
License: j.lacoma

Model Architecture

The classification model is composed of several hidden layers, each followed by batch normalization and activation layers (ReLU, LeakyReLU, or Tanh). Dropout is applied for regularization, and the final layer outputs predictions through a sigmoid activation, suitable for binary classification.

The architecture supports multiple weight initialization strategies, with the default being Xavier uniform initialization. The output layer is designed for binary classification tasks, predicting probabilities for each class (0 or 1).

Example Usage

You can load the model directly from Hugging Face and use it for inference. Below is an example that demonstrates how to load the model, define custom weight initialization, and perform inference.

import torch
from transformers import AutoModelForSequenceClassification

# Load model from Hugging Face
model = AutoModelForSequenceClassification.from_pretrained("JayLacoma/Pytorch_Classifier_fMRI")

# Define weight initialization function
def weight_init(m):
    if isinstance(m, torch.nn.Linear):
        torch.nn.init.xavier_uniform_(m.weight)
        torch.nn.init.zeros_(m.bias)

# (Optional) Load saved model weights
model.load_state_dict(torch.load('Pytorch_Classifier_fMRI.pth'))

# Set model to evaluation mode
model.eval()

# Inference example
input_data = torch.tensor([[0.1, 0.2, 0.3, ..., 0.8]])  # Example input data
with torch.no_grad():
    logits = model(input_data).logits
    prediction = torch.sigmoid(logits).round()  # Binary classification output: 0 or 1
    print(f"Predicted label: {prediction.item()}")

Hyperparameter Optimization with Optuna

The model has been optimized using Optuna, a hyperparameter tuning framework. The following hyperparameters were fine-tuned through trials to achieve the best results:

{
  "hidden_layers": [512, 256],
  "dropout_rate": 0.37,
  "learning_rate": 0.00667,
  "weight_decay": 1.33e-06,
  "batch_size": 32,
  "patience": 10,
  "activation_function": "tanh",
  "optimizer": "AdamW",
  "scheduler": "StepLR",
  "clip_grad": 3.67
}

Training Workflow

The training loop includes gradient clipping, learning rate scheduling, and early stopping based on the validation loss. The optimizer (AdamW) and the loss function (BCEWithLogitsLoss) provide stable and accurate optimization for binary classification. The scheduler (StepLR) adjusts the learning rate dynamically to improve convergence during training.

Here’s a snippet that shows how the training is managed:

# Model Training Function
def train_model(model, train_loader, test_loader, criterion, optimizer, scheduler, max_epochs, patience, clip_grad=None, scheduler_name=None):
    best_loss = float('inf')
    no_improvement = 0

    for epoch in range(max_epochs):
        model.train()
        running_loss = 0.0

        for inputs, labels in train_loader:
            inputs, labels = inputs.to(device), labels.to(device)
            optimizer.zero_grad()

            outputs = model(inputs)
            loss = criterion(outputs, labels)
            loss.backward()

            if clip_grad is not None:
                torch.nn.utils.clip_grad_norm_(model.parameters(), clip_grad)

            optimizer.step()
            running_loss += loss.item()

        model.eval()
        val_loss = 0.0
        with torch.no_grad():
            for inputs, labels in test_loader:
                inputs, labels = inputs.to(device), labels.to(device)
                outputs = model(inputs)
                val_loss += criterion(outputs, labels).item()

        # Early stopping logic here...

    return model

Evaluation Metrics

After training, the model can be evaluated using various metrics:

Accuracy: Measures how often the model predicts the correct class.
AUC-ROC: Area Under the Curve of the Receiver Operating Characteristic curve. A higher AUC indicates better distinction between classes.
Precision: Proportion of true positives among all positive predictions.
Recall: Proportion of true positives correctly identified by the model.
F1-Score: Harmonic mean of precision and recall, balancing both metrics.

from sklearn.metrics import roc_auc_score, accuracy_score, precision_score, recall_score, f1_score

# Model Evaluation Function
def evaluate_model(model, test_loader):
    model.eval()
    all_outputs, all_labels = [], []
    
    with torch.no_grad():
        for inputs, labels in test_loader:
            inputs, labels = inputs.to(device), labels.to(device)
            outputs = model(inputs)
            all_outputs.append(outputs.cpu().numpy())
            all_labels.append(labels.cpu().numpy())

    # Metrics calculation
    auc_roc = roc_auc_score(all_labels, all_outputs)
    accuracy = accuracy_score(all_labels, np.round(all_outputs))
    precision = precision_score(all_labels, np.round(all_outputs))
    recall = recall_score(all_labels, np.round(all_outputs))
    f1 = f1_score(all_labels, np.round(all_outputs))

    return auc_roc, accuracy, precision, recall, f1

Parameters

The following hyperparameters yielded the best results during Optuna trials:

Hidden Layers: [512, 256]
Dropout Rate: 0.37
Learning Rate: 0.00667
Weight Decay: 1.33e-06
Batch Size: 32
Activation Function: Tanh
Optimizer: AdamW
Scheduler: StepLR
Clip Gradients: 3.67

Dataset

The model expects fMRI data as input. The input dimension should correspond to the number of features in the dataset, typically pre-processed fMRI signals. Data is loaded using PyTorch's DataLoader, ensuring efficient mini-batch processing during training.

Hyperparameter Tunning

To tune the hyperparameters after obtaining a model from Hugging Face, you can integrate Optuna or other hyperparameter optimization frameworks into your workflow. Here’s how you can systematically tune hyperparameters for a pre-trained Hugging Face model using Optuna:

1. Install Necessary Libraries

Make sure you have installed all required dependencies:

pip install optuna torch transformers

2. Load the Pre-trained Model

First, load the pre-trained model from Hugging Face using the AutoModelForSequenceClassification class:

from transformers import AutoModelForSequenceClassification

# Load model from Hugging Face
model = AutoModelForSequenceClassification.from_pretrained("JayLacoma/Pytorch_Classifier_fMRI")

3. Define the Objective Function for Optuna

The objective function for Optuna includes loading the model, setting up the optimizer, loss function, and scheduler, and evaluating the performance on validation data.

Here’s an example objective function:

import optuna
import torch.optim as optim
from torch.utils.data import DataLoader

def objective(trial):
    # Define hyperparameters to tune
    dropout_rate = trial.suggest_float('dropout_rate', 0.2, 0.5)
    lr = trial.suggest_loguniform('lr', 1e-5, 1e-2)
    weight_decay = trial.suggest_loguniform('weight_decay', 1e-6, 1e-3)
    batch_size = trial.suggest_categorical('batch_size', [16, 32, 64])
    optimizer_name = trial.suggest_categorical('optimizer', ['AdamW', 'SGD', 'Adam'])
    scheduler_name = trial.suggest_categorical('scheduler', ['CosineAnnealingLR', 'StepLR', 'ReduceLROnPlateau'])

    # Apply dropout (if your model supports custom dropout; otherwise, it’s predefined in the architecture)
    model.dropout = torch.nn.Dropout(dropout_rate)

    # Optimizer setup
    if optimizer_name == 'AdamW':
        optimizer = optim.AdamW(model.parameters(), lr=lr, weight_decay=weight_decay)
    elif optimizer_name == 'SGD':
        optimizer = optim.SGD(model.parameters(), lr=lr, weight_decay=weight_decay, momentum=0.9)
    elif optimizer_name == 'Adam':
        optimizer = optim.Adam(model.parameters(), lr=lr, weight_decay=weight_decay)

    # Loss function
    criterion = torch.nn.BCEWithLogitsLoss()

    # Scheduler setup
    if scheduler_name == 'CosineAnnealingLR':
        scheduler = optim.lr_scheduler.CosineAnnealingLR(optimizer, T_max=10)
    elif scheduler_name == 'StepLR':
        scheduler = optim.lr_scheduler.StepLR(optimizer, step_size=5, gamma=0.1)
    elif scheduler_name == 'ReduceLROnPlateau':
        scheduler = optim.lr_scheduler.ReduceLROnPlateau(optimizer, mode='min', patience=3)

    # Create data loaders for training and validation (replace with your dataset)
    train_loader = DataLoader(train_dataset, batch_size=batch_size, shuffle=True)
    val_loader = DataLoader(val_dataset, batch_size=batch_size, shuffle=False)

    # Training loop with early stopping
    model.train()
    for epoch in range(10):  # For brevity, limiting to 10 epochs
        for inputs, labels in train_loader:
            inputs, labels = inputs.to(device), labels.to(device)
            optimizer.zero_grad()

            outputs = model(inputs).logits
            loss = criterion(outputs, labels)
            loss.backward()
            optimizer.step()

        # Validation loop
        model.eval()
        val_loss = 0
        with torch.no_grad():
            for inputs, labels in val_loader:
                inputs, labels = inputs.to(device), labels.to(device)
                outputs = model(inputs).logits
                val_loss += criterion(outputs, labels).item()

        # Step the scheduler
        if scheduler_name == 'ReduceLROnPlateau':
            scheduler.step(val_loss)
        else:
            scheduler.step()

    return val_loss / len(val_loader)  # Return the average validation loss

4. Run the Hyperparameter Search with Optuna

You can now create an Optuna study and run the optimization process across multiple trials. Each trial will search for better hyperparameters by minimizing the validation loss.

study = optuna.create_study(direction='minimize')  # Optimize to minimize the validation loss
study.optimize(objective, n_trials=100)  # Run for 100 trials

# Retrieve the best hyperparameters found by Optuna
best_params = study.best_params
print(f"Best hyperparameters: {best_params}")

5. Use the Best Hyperparameters for Model Training

Once Optuna finishes running, you can train the model using the best hyperparameters found:

# Use the best hyperparameters to train your final model
model = AutoModelForSequenceClassification.from_pretrained("JayLacoma/Pytorch_Classifier_fMRI")

# Apply best parameters
dropout_rate = best_params['dropout_rate']
lr = best_params['lr']
weight_decay = best_params['weight_decay']
batch_size = best_params['batch_size']
optimizer_name = best_params['optimizer']
scheduler_name = best_params['scheduler']

# Continue to train using these hyperparameters as demonstrated earlier

Summary of Steps

Load the Hugging Face model.
Define an Optuna objective function that incorporates the model, optimizer, loss, and scheduler.
Use Optuna to perform hyperparameter tuning by running multiple trials.
Retrieve the best hyperparameters and re-train the model for final use.

This process automates the tuning of hyperparameters like learning rate, batch size, dropout rate, and optimizer type, helping to achieve better model performance.