|
--- |
|
license: cc-by-nc-4.0 |
|
pipeline_tag: fill-mask |
|
widget: |
|
- text: >- |
|
The most trusted online bulk <mask> seller in the world -Consistent 90%+ |
|
purity -All shipments straight off the brick. 250-500g orders received a |
|
portion of a stamped brick. At 1000g, full stamped bricks are shipped. -We |
|
utilize the best packaging equipment available for the highest level of |
|
stealth and security. |
|
extra_gated_prompt: >- |
|
DarkBERT is available for access upon request. Users may submit their request |
|
using the form below, which includes the **name of the user**, the **user’s |
|
institution**, the **user’s email address that matches the |
|
institution** *(we especially emphasize this part; any non-academic addresses such as |
|
gmail, tutanota, protonmail, etc. are automatically rejected as it makes it difficult |
|
for us to verify your affiliation to the institution)*, and the |
|
**purpose of usage** *(in as much detail as possible)*. By requesting and downloading DarkBERT, the user agrees to |
|
the following: the user acknowledges that the use of this model is restricted |
|
to research and/or academic purposes only. Access to the model will be granted |
|
after the request is manually reviewed. A request may be declined if it does |
|
not sufficiently describe research purposes that follow the ACM Code of Ethics |
|
(https://www.acm.org/code-of-ethics). The information provided by the |
|
requesting user will not be used in any way except for sending the dataset to |
|
the user and keeping track of request history for DarkBERT. By requesting for |
|
the model, the user agrees to our collection of the provided information. This |
|
model shall only be used for non-profit research purposes and in a manner |
|
consistent with fair practice. Do not redistribute this dataset to others. The |
|
user should indicate the source of this model (found at the bottom of the |
|
page) when using or citing the model in their research or article. |
|
extra_gated_fields: |
|
Full Name: text |
|
Affiliated Institution / Organization / University: text |
|
E-mail (must match affiliation, generic domains such as gmail not allowed): text |
|
Position (ex doctoral student, professor, researcher, etc): text |
|
Purpose of Usage (Please describe the purpose of usage in as much detail as possible): text |
|
Country: text |
|
I have read the conditions and agree to use this model for ethical, non-commercial use ONLY: checkbox |
|
A request cannot be modified once submitted; I understand that requests with incomplete, insufficient, or inaccurate information will be rejected: checkbox |
|
language: |
|
- en |
|
--- |
|
|
|
# DarkBERT |
|
A BERT-like model pretrained with a Dark Web corpus as described in "DarkBERT: A Language Model for the Dark Side of the Internet (ACL 2023)" |
|
|
|
# Conditions |
|
DarkBERT is available for access upon request. Users may |
|
submit their request using the form below, which includes the **name of the |
|
user**, the **user’s institution**, the **user’s email address that matches the |
|
institution** (we especially emphasize this part; any non-academic addresses such as |
|
gmail, tutanota, protonmail, etc. are automatically rejected as it makes it difficult |
|
for us to verify your affiliation to the institution) and the **purpose of usage**. |
|
By requesting and downloading DarkBERT, the user agrees to the following: the user acknowledges that the use of this |
|
model is restricted to research and/or academic purposes only. Access to the |
|
model will be granted after the request is manually reviewed. A request may be |
|
declined if it does not sufficiently describe research purposes that follow |
|
the ACM Code of Ethics (https://www.acm.org/code-of-ethics). The information |
|
provided by the requesting user will not be used in any way except for sending |
|
the dataset to the user and keeping track of request history for DarkBERT. By |
|
requesting for the model, the user agrees to our collection of the provided |
|
information. This model shall only be used for non-profit research purposes |
|
and in a manner consistent with fair practice. Do not redistribute this |
|
dataset to others. The user should indicate the source of this model (found at |
|
the bottom of the page) when using or citing the model in their research or |
|
article. |
|
|
|
## What is included? |
|
|
|
The preprocessed version of DarkBERT. |
|
|
|
Benchmark datasets in the `benchmark-dataset` directory. |
|
|
|
## Sample Usage |
|
```python |
|
>>> from transformers import pipeline |
|
>>> folder_dir = "DarkBERT" |
|
>>> unmasker = pipeline('fill-mask', model=folder_dir) |
|
>>> unmasker("RagnarLocker, LockBit, and REvil are types of <mask>.") |
|
|
|
[{'score': 0.4952353239059448, 'token': 25346, 'token_str': ' ransomware', 'sequence': 'RagnarLocker, LockBit, and REvil are types of ransomware.'}, |
|
{'score': 0.04661545157432556, 'token': 16886, 'token_str': ' malware', 'sequence': 'RagnarLocker, LockBit, and REvil are types of malware.'}, |
|
{'score': 0.04217657446861267, 'token': 28811, 'token_str': ' wallets', 'sequence': 'RagnarLocker, LockBit, and REvil are types of wallets.'}, |
|
{'score': 0.028982503339648247, 'token': 2196, 'token_str': ' drugs', 'sequence': 'RagnarLocker, LockBit, and REvil are types of drugs.'}, |
|
{'score': 0.020001502707600594, 'token': 11344, 'token_str': ' hackers', 'sequence': 'RagnarLocker, LockBit, and REvil are types of hackers.'}] |
|
|
|
>>> from transformers import AutoModel, AutoTokenizer |
|
>>> model = AutoModel.from_pretrained(folder_dir) |
|
>>> tokenizer = AutoTokenizer.from_pretrained(folder_dir) |
|
>>> text = "Recent research has suggested that there are clear differences in the language used in the Dark Web compared to that of the Surface Web." |
|
>>> encoded = tokenizer(text, return_tensors="pt") |
|
>>> output = model(**encoded) |
|
>>> output[0].shape |
|
|
|
torch.Size([1, 27, 768]) |
|
|
|
``` |
|
## Citation |
|
If you are using the DarkBERT model, please cite the following paper accordingly: |
|
``` |
|
Youngjin Jin, Eugene Jang, Jian Cui, Jin-Woo Chung, Yongjae Lee, and Seungwon Shin. 2023. DarkBERT: A Language Model for the Dark Side of the Internet. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 7515–7533, Toronto, Canada. Association for Computational Linguistics. |
|
``` |