Upload README.md
Browse files
README.md
CHANGED
@@ -53,26 +53,16 @@ dtype: float16
|
|
53 |
|
54 |
# **Model Benchmark**
|
55 |
|
56 |
-
## Open Ko leaderboard
|
57 |
-
- Follow up as [Ko-link](https://huggingface.co/spaces/upstage/open-ko-llm-leaderboard).
|
58 |
-
|
59 |
| Model | Average | ARC | HellaSwag | MMLU | TruthfulQA | Ko-CommonGenV2 |
|
60 |
| --- | --- | --- | --- | --- | --- | --- |
|
61 |
| PracticeLLM/Twice-KoSOLAR-16.1B-test | NaN | NaN | NaN | NaN | NaN | NaN |
|
|
|
62 |
| [jjourney1125/M-SOLAR-10.7B-v1.0](https://huggingface.co/jjourney1125/M-SOLAR-10.7B-v1.0) | 55.15 | 49.57 | 60.12 | 54.60 | 49.23 | 62.22 |
|
63 |
| [seungduk/KoSOLAR-10.7B-v0.1](https://huggingface.co/seungduk/KoSOLAR-10.7B-v0.1) | 52.40 | 47.18 | 59.54 | 52.04 | 41.84 | 61.39 |
|
64 |
|
65 |
-
- Follow up as [
|
66 |
-
| Model | Average | ARC | HellaSwag | MMLU | TruthfulQA | Winogrande | GSM8K |
|
67 |
-
| --- | --- | --- | --- | --- | --- | --- | --- |
|
68 |
-
| PracticeLLM/Twice-KoSOLAR-16.1B-test | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
|
69 |
-
| [kyujinpy/Sakura-SOLAR-Instruct](https://huggingface.co/kyujinpy/Sakura-SOLAR-Instruct) | **74.40** | 70.99 | 88.42 | 66.33 | 71.79 | 83.66 | 65.20 |
|
70 |
-
| [seungduk/KoSOLAR-10.7B-v0.1](https://huggingface.co/seungduk/KoSOLAR-10.7B-v0.1) | 66.04 | 62.03 | 84.54 | 65.56 | 45.03 | 83.58 | 55.50 |
|
71 |
-
| [upstage/SOLAR-10.7B-v1.0](https://huggingface.co/upstage/SOLAR-10.7B-v1.0) | 66.04 | 61.95 | 84.60 | 65.48 | 45.04 | 83.66 | 55.50 |
|
72 |
-
| [mistralai/Mistral-7B-v0.1](https://huggingface.co/mistralai/Mistral-7B-v0.1) | 60.97 | 59.98 | 83.31 | 64.16 | 42.15 | 78.37 | 37.83 |
|
73 |
-
|
74 |
-
## lm-evaluation-harness(zero-shot)
|
75 |
-
- Follow up as [beomi/LM-Harness](https://github.com/Beomi/ko-lm-evaluation-harness)
|
76 |
```
|
77 |
gpt2 (pretrained=PracticeLLM/Twice-KoSOLAR-16.1B-test), limit: None, provide_description: False, num_fewshot: 0, batch_size: None
|
78 |
| Task |Version| Metric |Value | |Stderr|
|
@@ -87,6 +77,19 @@ gpt2 (pretrained=PracticeLLM/Twice-KoSOLAR-16.1B-test), limit: None, provide_des
|
|
87 |
|kobest_sentineg | 0|acc |0.7078|± |0.0229|
|
88 |
| | |macro_f1|0.7071|± |0.0229|
|
89 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
90 |
gpt2 (pretrained=jjourney1125/M-SOLAR-10.7B-v1.0), limit: None, provide_description: False, num_fewshot: 0, batch_size: None
|
91 |
| Task |Version| Metric |Value | |Stderr|
|
92 |
|----------------|------:|--------|-----:|---|-----:|
|
@@ -112,12 +115,20 @@ gpt2 (pretrained=yanolja/KoSOLAR-10.7B-v0.1), limit: None, provide_description:
|
|
112 |
| | |macro_f1|0.4296|± |0.0221|
|
113 |
|kobest_sentineg | 0|acc |0.7506|± |0.0217|
|
114 |
| | |macro_f1|0.7505|± |0.0217|
|
|
|
115 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
116 |
|
117 |
-
```
|
118 |
-
|
119 |
- Follow up as [Eleuther/LM-Harness](https://github.com/EleutherAI/lm-evaluation-harness)
|
120 |
-
```
|
121 |
(will update)
|
122 |
```
|
123 |
|
|
|
53 |
|
54 |
# **Model Benchmark**
|
55 |
|
56 |
+
## Open Ko-LLM leaderboard & lm-evaluation-harness(zero-shot)
|
57 |
+
- Follow up as [Ko-link](https://huggingface.co/spaces/upstage/open-ko-llm-leaderboard).
|
|
|
58 |
| Model | Average | ARC | HellaSwag | MMLU | TruthfulQA | Ko-CommonGenV2 |
|
59 |
| --- | --- | --- | --- | --- | --- | --- |
|
60 |
| PracticeLLM/Twice-KoSOLAR-16.1B-test | NaN | NaN | NaN | NaN | NaN | NaN |
|
61 |
+
| [Megastudy/M-SOLAR-10.7B-v1.1-beta](https://huggingface.co/Megastudy/M-SOLAR-10.7B-v1.1-beta) | 55.25 | 51.71 | 60.86 | 54.24 | 47.12 | 62.34 |
|
62 |
| [jjourney1125/M-SOLAR-10.7B-v1.0](https://huggingface.co/jjourney1125/M-SOLAR-10.7B-v1.0) | 55.15 | 49.57 | 60.12 | 54.60 | 49.23 | 62.22 |
|
63 |
| [seungduk/KoSOLAR-10.7B-v0.1](https://huggingface.co/seungduk/KoSOLAR-10.7B-v0.1) | 52.40 | 47.18 | 59.54 | 52.04 | 41.84 | 61.39 |
|
64 |
|
65 |
+
- Follow up as [beomi/LM-Harness](https://github.com/Beomi/ko-lm-evaluation-harness)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
66 |
```
|
67 |
gpt2 (pretrained=PracticeLLM/Twice-KoSOLAR-16.1B-test), limit: None, provide_description: False, num_fewshot: 0, batch_size: None
|
68 |
| Task |Version| Metric |Value | |Stderr|
|
|
|
77 |
|kobest_sentineg | 0|acc |0.7078|± |0.0229|
|
78 |
| | |macro_f1|0.7071|± |0.0229|
|
79 |
|
80 |
+
gpt2 (pretrained=Megastudy/M-SOLAR-10.7B-v1.1-beta), limit: None, provide_description: False, num_fewshot: 0, batch_size: None
|
81 |
+
| Task |Version| Metric |Value | |Stderr|
|
82 |
+
|----------------|------:|--------|-----:|---|-----:|
|
83 |
+
|kobest_boolq | 0|acc |0.7137|± |0.0121|
|
84 |
+
| | |macro_f1|0.6878|± |0.0128|
|
85 |
+
|kobest_copa | 0|acc |0.7060|± |0.0144|
|
86 |
+
| | |macro_f1|0.7054|± |0.0145|
|
87 |
+
|kobest_hellaswag| 0|acc |0.4620|± |0.0223|
|
88 |
+
| | |acc_norm|0.5360|± |0.0223|
|
89 |
+
| | |macro_f1|0.4595|± |0.0223|
|
90 |
+
|kobest_sentineg | 0|acc |0.7431|± |0.0220|
|
91 |
+
| | |macro_f1|0.7295|± |0.0230|
|
92 |
+
|
93 |
gpt2 (pretrained=jjourney1125/M-SOLAR-10.7B-v1.0), limit: None, provide_description: False, num_fewshot: 0, batch_size: None
|
94 |
| Task |Version| Metric |Value | |Stderr|
|
95 |
|----------------|------:|--------|-----:|---|-----:|
|
|
|
115 |
| | |macro_f1|0.4296|± |0.0221|
|
116 |
|kobest_sentineg | 0|acc |0.7506|± |0.0217|
|
117 |
| | |macro_f1|0.7505|± |0.0217|
|
118 |
+
```
|
119 |
|
120 |
+
## Open EN-LLM leaderboard & lm-evaluation-harness(zero-shot)
|
121 |
+
- Follow up as [En-link](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard).
|
122 |
+
| Model | Average | ARC | HellaSwag | MMLU | TruthfulQA | Winogrande | GSM8K |
|
123 |
+
| --- | --- | --- | --- | --- | --- | --- | --- |
|
124 |
+
| PracticeLLM/Twice-KoSOLAR-16.1B-test | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
|
125 |
+
| [kyujinpy/Sakura-SOLAR-Instruct](https://huggingface.co/kyujinpy/Sakura-SOLAR-Instruct) | **74.40** | 70.99 | 88.42 | 66.33 | 71.79 | 83.66 | 65.20 |
|
126 |
+
| [seungduk/KoSOLAR-10.7B-v0.1](https://huggingface.co/seungduk/KoSOLAR-10.7B-v0.1) | 66.04 | 62.03 | 84.54 | 65.56 | 45.03 | 83.58 | 55.50 |
|
127 |
+
| [upstage/SOLAR-10.7B-v1.0](https://huggingface.co/upstage/SOLAR-10.7B-v1.0) | 66.04 | 61.95 | 84.60 | 65.48 | 45.04 | 83.66 | 55.50 |
|
128 |
+
| [mistralai/Mistral-7B-v0.1](https://huggingface.co/mistralai/Mistral-7B-v0.1) | 60.97 | 59.98 | 83.31 | 64.16 | 42.15 | 78.37 | 37.83 |
|
129 |
|
|
|
|
|
130 |
- Follow up as [Eleuther/LM-Harness](https://github.com/EleutherAI/lm-evaluation-harness)
|
131 |
+
```yaml
|
132 |
(will update)
|
133 |
```
|
134 |
|