Pankaj Mathur commited on
Commit
79afbc4
1 Parent(s): 0aca5a9

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +2 -2
README.md CHANGED
@@ -11,7 +11,7 @@ datasets:
11
  ---
12
  # orca_mini_v2_7b
13
 
14
- An **Uncensored** LLaMA-7b model in collaboration with [Eric Hartford](https://huggingface.co/ehartford). trained on explain tuned datasets, created using Instructions and Input from WizardLM, Alpaca & Dolly-V2 datasets and applying Orca Research Paper dataset construction approaches and then filtered for any kind of refusals.
15
 
16
  Please note this model has *better code generation capabilities* compare to our original orca_mini_7b which was trained on base OpenLLaMA-7b model and which has the [empty spaces issues & found not good for code generation]((https://github.com/openlm-research/open_llama#update-06072023)).
17
 
@@ -51,7 +51,7 @@ please note num_fewshots varies for each below task as used by HuggingFaceH4 Ope
51
 
52
  # Dataset
53
 
54
- We used [remove_refusals.py](https://huggingface.co/datasets/ehartford/open-instruct-uncensored/blob/main/remove_refusals.py) script on top of the previous explain tuned datasets we build which are [WizardLM dataset ~70K](https://github.com/nlpxucan/WizardLM), [Alpaca dataset ~52K](https://crfm.stanford.edu/2023/03/13/alpaca.html) & [Dolly-V2 dataset ~15K](https://github.com/databrickslabs/dolly) created using approaches from [Orca Research Paper](https://arxiv.org/abs/2306.02707).
55
 
56
  We leverage all of the 15 system instructions provided in Orca Research Paper. to generate custom datasets, in contrast to vanilla instruction tuning approaches used by original datasets.
57
 
 
11
  ---
12
  # orca_mini_v2_7b
13
 
14
+ An **Uncensored** LLaMA-7b model in collaboration with [Eric Hartford](https://huggingface.co/ehartford). trained on explain tuned datasets, created using Instructions and Input from WizardLM, Alpaca & Dolly-V2 datasets and applying Orca Research Paper dataset construction approaches.
15
 
16
  Please note this model has *better code generation capabilities* compare to our original orca_mini_7b which was trained on base OpenLLaMA-7b model and which has the [empty spaces issues & found not good for code generation]((https://github.com/openlm-research/open_llama#update-06072023)).
17
 
 
51
 
52
  # Dataset
53
 
54
+ We used uncensored script on top of the previous explain tuned datasets we build which are [WizardLM dataset ~70K](https://github.com/nlpxucan/WizardLM), [Alpaca dataset ~52K](https://crfm.stanford.edu/2023/03/13/alpaca.html) & [Dolly-V2 dataset ~15K](https://github.com/databrickslabs/dolly) created using approaches from [Orca Research Paper](https://arxiv.org/abs/2306.02707).
55
 
56
  We leverage all of the 15 system instructions provided in Orca Research Paper. to generate custom datasets, in contrast to vanilla instruction tuning approaches used by original datasets.
57