asofter commited on
Commit
ceceaef
1 Parent(s): dbeed6b

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +88 -12
README.md CHANGED
@@ -1,24 +1,59 @@
1
  ---
2
- license: mit
3
  base_model: microsoft/deberta-v3-base
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
4
  tags:
 
 
 
5
  - generated_from_trainer
6
  metrics:
7
  - accuracy
8
  - recall
9
  - precision
10
  - f1
 
11
  model-index:
12
- - name: deberta-v3-base-prompt-injection-v1
13
- results: []
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
14
  ---
15
 
16
- <!-- This model card has been generated automatically according to the information the Trainer had access to. You
17
- should probably proofread and complete it, then remove this comment. -->
18
 
19
- # deberta-v3-base-prompt-injection-v1
 
 
20
 
21
- This model is a fine-tuned version of [microsoft/deberta-v3-base](https://huggingface.co/microsoft/deberta-v3-base) on an unknown dataset.
22
  It achieves the following results on the evaluation set:
23
  - Loss: 0.0010
24
  - Accuracy: 0.9999
@@ -26,17 +61,46 @@ It achieves the following results on the evaluation set:
26
  - Precision: 0.9998
27
  - F1: 0.9998
28
 
29
- ## Model description
 
 
 
 
 
 
 
 
 
 
 
 
30
 
31
- More information needed
32
 
33
- ## Intended uses & limitations
 
 
34
 
35
- More information needed
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
36
 
37
  ## Training and evaluation data
38
 
39
- More information needed
40
 
41
  ## Training procedure
42
 
@@ -67,3 +131,15 @@ The following hyperparameters were used during training:
67
  - Pytorch 2.1.1+cu121
68
  - Datasets 2.15.0
69
  - Tokenizers 0.15.0
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
+ license: apache-2.0
3
  base_model: microsoft/deberta-v3-base
4
+ datasets:
5
+ - Lakera/gandalf_ignore_instructions
6
+ - rubend18/ChatGPT-Jailbreak-Prompts
7
+ - imoxto/prompt_injection_cleaned_dataset-v2
8
+ - hackaprompt/hackaprompt-dataset
9
+ - fka/awesome-chatgpt-prompts
10
+ - teven/prompted_examples
11
+ - Dahoas/synthetic-hh-rlhf-prompts
12
+ - Dahoas/hh_prompt_format
13
+ - MohamedRashad/ChatGPT-prompts
14
+ - HuggingFaceH4/instruction-dataset
15
+ - HuggingFaceH4/no_robots
16
+ - HuggingFaceH4/ultrachat_200k
17
+ language:
18
+ - en
19
  tags:
20
+ - prompt-injection
21
+ - injection
22
+ - security
23
  - generated_from_trainer
24
  metrics:
25
  - accuracy
26
  - recall
27
  - precision
28
  - f1
29
+ pipeline_tag: text-classification
30
  model-index:
31
+ - name: deberta-v3-base-prompt-injection
32
+ results:
33
+ - task:
34
+ type: text-classification
35
+ name: Prompt Injection Detection
36
+ metrics:
37
+ - type: precision
38
+ value: 0.9998
39
+ - type: f1
40
+ value: 0.9998
41
+ - type: accuracy
42
+ value: 0.9999
43
+ - type: recall
44
+ value: 0.9997
45
+ co2_eq_emissions:
46
+ emissions: 0.9990662916168788
47
+ source: "code carbon"
48
+ training_type: "fine-tuning"
49
  ---
50
 
51
+ # Model Card for deberta-v3-base-prompt-injection
 
52
 
53
+ This model is a fine-tuned version of [microsoft/deberta-v3-base](https://huggingface.co/microsoft/deberta-v3-base) on multiple combined datasets of prompt injections and normal prompts.
54
+
55
+ It aims to identify prompt injections, classifying inputs into two categories: `0` for no injection and `1` for injection detected.
56
 
 
57
  It achieves the following results on the evaluation set:
58
  - Loss: 0.0010
59
  - Accuracy: 0.9999
 
61
  - Precision: 0.9998
62
  - F1: 0.9998
63
 
64
+ ## Model details
65
+
66
+ - **Fine-tuned by:** Laiyer.ai
67
+ - **Model type:** deberta-v3
68
+ - **Language(s) (NLP):** English
69
+ - **License:** Apache license 2.0
70
+ - **Finetuned from model:** [microsoft/deberta-v3-base](https://huggingface.co/microsoft/deberta-v3-base)
71
+
72
+ ## Intended Uses & Limitations
73
+
74
+ It aims to identify prompt injections, classifying inputs into two categories: `0` for no injection and `1` for injection detected.
75
+
76
+ The model's performance is dependent on the nature and quality of the training data. It might not perform well on text styles or topics not represented in the training set.
77
 
78
+ ## How to Get Started with the Model
79
 
80
+ ```python
81
+ from transformers import AutoTokenizer, AutoModelForSequenceClassification
82
+ import torch
83
 
84
+ tokenizer = AutoTokenizer.from_pretrained("laiyer/deberta-v3-base-prompt-injection")
85
+ model = AutoModelForSequenceClassification.from_pretrained("laiyer/deberta-v3-base-prompt-injection")
86
+
87
+ text = "Your prompt injection is here"
88
+
89
+ classifier = pipeline(
90
+ "text-classification",
91
+ model=model,
92
+ tokenizer=tokenizer,
93
+ truncation=True,
94
+ max_length=512,
95
+ device=torch.device("cuda" if torch.cuda.is_available() else "cpu"),
96
+ )
97
+
98
+ print(classifier(text))
99
+ ```
100
 
101
  ## Training and evaluation data
102
 
103
+ The model was trained on a custom dataset from multiple open-source ones. We used ~30% prompt injections and ~70% of good prompts.
104
 
105
  ## Training procedure
106
 
 
131
  - Pytorch 2.1.1+cu121
132
  - Datasets 2.15.0
133
  - Tokenizers 0.15.0
134
+
135
+ ## Citation
136
+
137
+ ```
138
+ @misc{deberta-v3-base-prompt-injection,
139
+ author = {Laiyer.ai},
140
+ title = {Fine-Tuned DeBERTa-v3 for Prompt Injection Detection},
141
+ year = {2023},
142
+ publisher = {HuggingFace},
143
+ url = {https://huggingface.co/laiyer/deberta-v3-base-prompt-injection},
144
+ }
145
+ ```