Pankaj Mathur
commited on
Commit
•
cf162d1
1
Parent(s):
3a2b00e
Update README.md
Browse files
README.md
CHANGED
@@ -11,7 +11,7 @@ datasets:
|
|
11 |
---
|
12 |
# orca_mini_v2_7b
|
13 |
|
14 |
-
An **Uncensored** LLaMA-7b model trained on explain tuned datasets, created using Instructions and Input from WizardLM, Alpaca & Dolly-V2 datasets and applying Orca Research Paper dataset construction approaches.
|
15 |
|
16 |
Please note this model has *better code generation capabilities* compare to our original orca_mini_7b which was trained on base OpenLLaMA-7b model and which has the [empty spaces issues & found not good for code generation]((https://github.com/openlm-research/open_llama#update-06072023)).
|
17 |
|
@@ -22,7 +22,7 @@ Please note this model has *better code generation capabilities* compare to our
|
|
22 |
|
23 |
I evaluated orca_mini_v2_7b on a wide range of tasks using [Language Model Evaluation Harness](https://github.com/EleutherAI/lm-evaluation-harness) from EleutherAI.
|
24 |
|
25 |
-
Here are the
|
26 |
|
27 |
|||||||
|
28 |
|:------:|:-------------:|:---------:|:--------:|:-------:|:--------:|
|
@@ -33,13 +33,25 @@ Here are the results, please note num_fewshots for each task.
|
|
33 |
|*hellaswag*|0|0|acc_norm|0.7394|0.0044|
|
34 |
|*truthfulqa_mc*|0|1|mc1|0.2938|0.0159|
|
35 |
|*truthfulqa_mc*|0|1|mc2|0.4399|0.0153|
|
|
|
|
|
36 |
|
37 |
|
38 |
-
|
|
|
|
|
39 |
|
40 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
41 |
|
42 |
-
on top of the previous explain tuned datasets we build which are [WizardLM dataset ~70K](https://github.com/nlpxucan/WizardLM), [Alpaca dataset ~52K](https://crfm.stanford.edu/2023/03/13/alpaca.html) & [Dolly-V2 dataset ~15K](https://github.com/databrickslabs/dolly) created using approaches from [Orca Research Paper](https://arxiv.org/abs/2306.02707).
|
43 |
|
44 |
We leverage all of the 15 system instructions provided in Orca Research Paper. to generate custom datasets, in contrast to vanilla instruction tuning approaches used by original datasets.
|
45 |
|
|
|
11 |
---
|
12 |
# orca_mini_v2_7b
|
13 |
|
14 |
+
An **Uncensored** LLaMA-7b model trained on explain tuned datasets, created using Instructions and Input from WizardLM, Alpaca & Dolly-V2 datasets and applying Orca Research Paper dataset construction approaches and then filtered for any kind of refusals, thanks to [Eric Hartford](https://huggingface.co/ehartford).
|
15 |
|
16 |
Please note this model has *better code generation capabilities* compare to our original orca_mini_7b which was trained on base OpenLLaMA-7b model and which has the [empty spaces issues & found not good for code generation]((https://github.com/openlm-research/open_llama#update-06072023)).
|
17 |
|
|
|
22 |
|
23 |
I evaluated orca_mini_v2_7b on a wide range of tasks using [Language Model Evaluation Harness](https://github.com/EleutherAI/lm-evaluation-harness) from EleutherAI.
|
24 |
|
25 |
+
Here are the zero shot metrics results.
|
26 |
|
27 |
|||||||
|
28 |
|:------:|:-------------:|:---------:|:--------:|:-------:|:--------:|
|
|
|
33 |
|*hellaswag*|0|0|acc_norm|0.7394|0.0044|
|
34 |
|*truthfulqa_mc*|0|1|mc1|0.2938|0.0159|
|
35 |
|*truthfulqa_mc*|0|1|mc2|0.4399|0.0153|
|
36 |
+
|*mmlu avg*|0|1|acc|0.4108|0.0153|
|
37 |
+
|*mmlu avg*|0|1|acc_norm|0.4108|0.0153|
|
38 |
|
39 |
|
40 |
+
Here are the results on metrics used by [HuggingFaceH4 Open LLM Leaderboard](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard)
|
41 |
+
|
42 |
+
please note num_fewshots varies for each below task as used by HuggingFaceH4 Open LLM Leaderboard
|
43 |
|
44 |
+
|||||||
|
45 |
+
|:------:|:-------------:|:---------:|:--------:|:-------:|:--------:|
|
46 |
+
|**Task**|**num_fewshot**|**Version**|**Metric**|**Value**|**Stderr**|
|
47 |
+
|*arc_challenge*|0|0|acc|0.7386|0.0090|
|
48 |
+
|*arc_challenge*|0|0|acc_norm|0.7066|0.0093|
|
49 |
+
|
50 |
+
|
51 |
+
|
52 |
+
# Dataset
|
53 |
|
54 |
+
We used [remove_refusals.py](https://huggingface.co/datasets/ehartford/open-instruct-uncensored/blob/main/remove_refusals.py) script on top of the previous explain tuned datasets we build which are [WizardLM dataset ~70K](https://github.com/nlpxucan/WizardLM), [Alpaca dataset ~52K](https://crfm.stanford.edu/2023/03/13/alpaca.html) & [Dolly-V2 dataset ~15K](https://github.com/databrickslabs/dolly) created using approaches from [Orca Research Paper](https://arxiv.org/abs/2306.02707).
|
55 |
|
56 |
We leverage all of the 15 system instructions provided in Orca Research Paper. to generate custom datasets, in contrast to vanilla instruction tuning approaches used by original datasets.
|
57 |
|