protectai
/

deberta-v3-base-prompt-injection

@@ -1,24 +1,59 @@
 ---
-license: mit
 base_model: microsoft/deberta-v3-base
 tags:
 - generated_from_trainer
 metrics:
 - accuracy
 - recall
 - precision
 - f1
 model-index:
-- name: deberta-v3-base-prompt-injection-v1
-  results: []
 ---
-<!-- This model card has been generated automatically according to the information the Trainer had access to. You
-should probably proofread and complete it, then remove this comment. -->
-# deberta-v3-base-prompt-injection-v1
-This model is a fine-tuned version of [microsoft/deberta-v3-base](https://huggingface.co/microsoft/deberta-v3-base) on an unknown dataset.
 It achieves the following results on the evaluation set:
 - Loss: 0.0010
 - Accuracy: 0.9999
@@ -26,17 +61,46 @@ It achieves the following results on the evaluation set:
 - Precision: 0.9998
 - F1: 0.9998
-## Model description
-More information needed
-## Intended uses & limitations
-More information needed
 ## Training and evaluation data
-More information needed
 ## Training procedure
@@ -67,3 +131,15 @@ The following hyperparameters were used during training:
 - Pytorch 2.1.1+cu121
 - Datasets 2.15.0
 - Tokenizers 0.15.0

 ---
+license: apache-2.0
 base_model: microsoft/deberta-v3-base
+datasets:
+- Lakera/gandalf_ignore_instructions
+- rubend18/ChatGPT-Jailbreak-Prompts
+- imoxto/prompt_injection_cleaned_dataset-v2
+- hackaprompt/hackaprompt-dataset
+- fka/awesome-chatgpt-prompts
+- teven/prompted_examples
+- Dahoas/synthetic-hh-rlhf-prompts
+- Dahoas/hh_prompt_format
+- MohamedRashad/ChatGPT-prompts
+- HuggingFaceH4/instruction-dataset
+- HuggingFaceH4/no_robots
+- HuggingFaceH4/ultrachat_200k
+language:
+- en
 tags:
+- prompt-injection
+- injection
+- security
 - generated_from_trainer
 metrics:
 - accuracy
 - recall
 - precision
 - f1
+pipeline_tag: text-classification
 model-index:
+- name: deberta-v3-base-prompt-injection
+  results:
+  - task:
+      type: text-classification
+      name: Prompt Injection Detection
+    metrics:
+      - type: precision
+        value: 0.9998
+      - type: f1
+        value: 0.9998
+      - type: accuracy
+        value: 0.9999
+      - type: recall
+        value: 0.9997
+co2_eq_emissions:
+  emissions: 0.9990662916168788
+  source: "code carbon"
+  training_type: "fine-tuning"
 ---
+# Model Card for deberta-v3-base-prompt-injection
+This model is a fine-tuned version of [microsoft/deberta-v3-base](https://huggingface.co/microsoft/deberta-v3-base) on multiple combined datasets of prompt injections and normal prompts.
+It aims to identify prompt injections, classifying inputs into two categories: `0` for no injection and `1` for injection detected.
 It achieves the following results on the evaluation set:
 - Loss: 0.0010
 - Accuracy: 0.9999
 - Precision: 0.9998
 - F1: 0.9998
+## Model details
+- **Fine-tuned by:** Laiyer.ai
+- **Model type:** deberta-v3
+- **Language(s) (NLP):** English
+- **License:** Apache license 2.0
+- **Finetuned from model:** [microsoft/deberta-v3-base](https://huggingface.co/microsoft/deberta-v3-base)
+## Intended Uses & Limitations
+It aims to identify prompt injections, classifying inputs into two categories: `0` for no injection and `1` for injection detected.
+The model's performance is dependent on the nature and quality of the training data. It might not perform well on text styles or topics not represented in the training set.
+## How to Get Started with the Model
+```python
+from transformers import AutoTokenizer, AutoModelForSequenceClassification
+import torch
+tokenizer = AutoTokenizer.from_pretrained("laiyer/deberta-v3-base-prompt-injection")
+model = AutoModelForSequenceClassification.from_pretrained("laiyer/deberta-v3-base-prompt-injection")
+text = "Your prompt injection is here"
+classifier = pipeline(
+  "text-classification",
+  model=model,
+  tokenizer=tokenizer,
+  truncation=True,
+  max_length=512,
+  device=torch.device("cuda" if torch.cuda.is_available() else "cpu"),
+)
+print(classifier(text))
+```
 ## Training and evaluation data
+The model was trained on a custom dataset from multiple open-source ones. We used ~30% prompt injections and ~70% of good prompts.
 ## Training procedure
 - Pytorch 2.1.1+cu121
 - Datasets 2.15.0
 - Tokenizers 0.15.0
+## Citation
+```
+@misc{deberta-v3-base-prompt-injection,
+  author = {Laiyer.ai},
+  title = {Fine-Tuned DeBERTa-v3 for Prompt Injection Detection},
+  year = {2023},
+  publisher = {HuggingFace},
+  url = {https://huggingface.co/laiyer/deberta-v3-base-prompt-injection},
+}
+```