---
license: apache-2.0
---

This repo contains models for generating hate speech and NLI adversarial examples. The base architecture is the [GPT-2 causal language model](https://huggingface.co/docs/transformers/model_doc/gpt2). Hate speech models are trained on the [DynaHate dataset](https://aclanthology.org/2021.acl-long.132.pdf), while NLI models are trained on [AdversarialNLI](https://aclanthology.org/2020.acl-main.441/). Further details can be found in this [paper](https://homes.cs.washington.edu/~skgabrie/emnlp_na.pdf). 

Models are intended for testing/improving robustness of neural classifiers only.