chiliu
commited on
Commit
•
89c2cad
1
Parent(s):
f61111b
add eval
Browse files
README.md
CHANGED
@@ -177,14 +177,14 @@ The original LLaMA model was trained for 1 trillion tokens and GPT-J was trained
|
|
177 |
|
178 |
| **Task/Metric** | Mamba-GPT 3B | LLaMA 7B | OpenLLaMA 7B | OpenLLaMA 3B | OpenLLaMA 13B 600BT |
|
179 |
| ---------------------- | -------- | -------- | ------------ | ------------ | ------------------- |
|
180 |
-
| anli_r1/acc | 0.35 | 0.35 | 0.33 | 0.33 | 0.33 |
|
181 |
| anli_r2/acc | 0.33 | 0.34 | 0.36 | 0.32 | 0.35 |
|
182 |
| anli_r3/acc | 0.35 | 0.37 | 0.38 | 0.35 | 0.38 |
|
183 |
| arc_challenge/acc | 0.35 | 0.39 | 0.37 | 0.34 | 0.39 |
|
184 |
| arc_challenge/acc_norm | 0.37 | 0.41 | 0.38 | 0.37 | 0.42 |
|
185 |
| arc_easy/acc | 0.71 | 0.68 | 0.72 | 0.69 | 0.74 |
|
186 |
| arc_easy/acc_norm | 0.65 | 0.52 | 0.68 | 0.65 | 0.70 |
|
187 |
-
| boolq/acc | 0.72 | 0.56 | 0.53 | 0.66 | 0.71 |
|
188 |
| hellaswag/acc | 0.49 | 0.36 | 0.63 | 0.43 | 0.54 |
|
189 |
| hellaswag/acc_norm | 0.66 | 0.73 | 0.72 | 0.67 | 0.73 |
|
190 |
| openbookqa/acc | 0.26 | 0.29 | 0.30 | 0.27 | 0.30 |
|
@@ -194,8 +194,8 @@ The original LLaMA model was trained for 1 trillion tokens and GPT-J was trained
|
|
194 |
| record/em | 0.88 | 0.91 | 0.89 | 0.88 | 0.90 |
|
195 |
| record/f1 | 0.88 | 0.91 | 0.90 | 0.89 | 0.90 |
|
196 |
| rte/acc | 0.55 | 0.56 | 0.60 | 0.58 | 0.65 |
|
197 |
-
| truthfulqa_mc/mc1 | 0.27 | 0.21 | 0.23 | 0.22 | 0.22 |
|
198 |
-
| truthfulqa_mc/mc2 | 0.37 | 0.34 | 0.35 | 0.35 | 0.35 |
|
199 |
| wic/acc | 0.49 | 0.50 | 0.51 | 0.48 | 0.49 |
|
200 |
| winogrande/acc | 0.63 | 0.68 | 0.67 | 0.62 | 0.67 |
|
201 |
| Average | 0.53 | 0.53 | 0.55 | 0.52 | 0.56 |
|
|
|
177 |
|
178 |
| **Task/Metric** | Mamba-GPT 3B | LLaMA 7B | OpenLLaMA 7B | OpenLLaMA 3B | OpenLLaMA 13B 600BT |
|
179 |
| ---------------------- | -------- | -------- | ------------ | ------------ | ------------------- |
|
180 |
+
| anli_r1/acc | **0.35** | 0.35 | 0.33 | 0.33 | 0.33 |
|
181 |
| anli_r2/acc | 0.33 | 0.34 | 0.36 | 0.32 | 0.35 |
|
182 |
| anli_r3/acc | 0.35 | 0.37 | 0.38 | 0.35 | 0.38 |
|
183 |
| arc_challenge/acc | 0.35 | 0.39 | 0.37 | 0.34 | 0.39 |
|
184 |
| arc_challenge/acc_norm | 0.37 | 0.41 | 0.38 | 0.37 | 0.42 |
|
185 |
| arc_easy/acc | 0.71 | 0.68 | 0.72 | 0.69 | 0.74 |
|
186 |
| arc_easy/acc_norm | 0.65 | 0.52 | 0.68 | 0.65 | 0.70 |
|
187 |
+
| boolq/acc | **0.72** | 0.56 | 0.53 | 0.66 | 0.71 |
|
188 |
| hellaswag/acc | 0.49 | 0.36 | 0.63 | 0.43 | 0.54 |
|
189 |
| hellaswag/acc_norm | 0.66 | 0.73 | 0.72 | 0.67 | 0.73 |
|
190 |
| openbookqa/acc | 0.26 | 0.29 | 0.30 | 0.27 | 0.30 |
|
|
|
194 |
| record/em | 0.88 | 0.91 | 0.89 | 0.88 | 0.90 |
|
195 |
| record/f1 | 0.88 | 0.91 | 0.90 | 0.89 | 0.90 |
|
196 |
| rte/acc | 0.55 | 0.56 | 0.60 | 0.58 | 0.65 |
|
197 |
+
| truthfulqa_mc/mc1 | **0.27** | 0.21 | 0.23 | 0.22 | 0.22 |
|
198 |
+
| truthfulqa_mc/mc2 | **0.37** | 0.34 | 0.35 | 0.35 | 0.35 |
|
199 |
| wic/acc | 0.49 | 0.50 | 0.51 | 0.48 | 0.49 |
|
200 |
| winogrande/acc | 0.63 | 0.68 | 0.67 | 0.62 | 0.67 |
|
201 |
| Average | 0.53 | 0.53 | 0.55 | 0.52 | 0.56 |
|