Beginner Question: usage with AutoTokenizer and AutoModel

#2
by antoninoLorenzo - opened

It may be that I am a beginner and I don't have a great understanding of Tensor architecture, however I tried to use the model and I am unable to convert the output in a label.
I wrote the following class:

class JailbreakClassifier:
    _model_name = "jackhhao/jailbreak-classifier"
    
    def __init__(self):
        self._tokenizer = AutoTokenizer.from_pretrained(self._model_name)
        self._model = AutoModel.from_pretrained(self._model_name)
        
    def predict(self, text: str):
        """Returns a label 'jailbreak' or 'benign'"""
        # Convert input text into tensors
        inputs = self._tokenizer(
            text,
            padding=True,
            truncation=True,
            return_tensors="pt"
        )
        
        # compute raw predictions
        with torch.no_grad():
            outputs = self._model(**inputs)
        
        # post-processing ?

outputs hasn't the classic "logits" attribute, I get that it is a BaseModelOutputWithPoolingAndCrossAttentions.

Hello!

So the problem here seems to be that you're loading the pretrained model using the base AutoModel class, instead of the AutoModelForSequenceClassification class. This creates a BaseModelOutputWithPoolingAndCrossAttentions output, like you mentioned, instead of the SequenceClassifierOutput with the loss and logits properties.

You can just change it to this:

from transformers import AutoTokenizer, AutoModelForSequenceClassification
self._tokenizer = AutoTokenizer.from_pretrained(self._model_name)
self._model = AutoModelForSequenceClassification.from_pretrained(self._model_name) # use the specific downstream classification model

And that should do it.

Potentially useful resources you can check out:

Sign up or log in to comment