Token Classification
Transformers
PyTorch
code
bert
Inference Endpoints

Is there any example code to inference the the-stack dataset with starpii like the webapi?

#4
by Chinglin - opened
This comment has been hidden

Hello, you can directly use Token classification pipeline like

from transformers import pipeline

classifier = pipeline("token-classification", model = "bigcode/starpii", aggregation_strategy="simple")
classifier("Hello I'm John and my IP address is 196.780.89.78")
[{'entity_group': 'NAME', 'score': 0.9997844, 'word': ' John', 'start': 9, 'end': 14}, {'entity_group': 'IP_ADDRESS', 'score': 0.99203795, 'word': '196.780.89.', 'start': 52, 'end': 63}]

Check this token-classification documentation and TokenClassificationPipeline docs for more details.

We also release the inference code we used to run PII detection at large scale here: https://github.com/bigcode-project/bigcode-dataset/tree/pii-ner/pii/ner

Note: I suggest that you delete the HF bearer token that you included in your message and create a new one since it's supposed to be a secret. (I took the liberty of hiding your post)

christopher changed discussion status to closed

Thanks a lot

Hey, how can I get auth_token to use your model? Im getting the following error:

OSError: bigcode/starpii is not a local folder and is not a valid model identifier listed on 'https://huggingface.co/models'
If this is a private repository, make sure to pass a token having permission to this repo with use_auth_token or log in with huggingface-cli login and pass use_auth_token=True.

Sign up or log in to comment