Jae-Won Chung commited on
Commit
827162e
1 Parent(s): a2463c2

Remove duplicate information

Browse files
Files changed (1) hide show
  1. LEADERBOARD.md +0 -5
LEADERBOARD.md CHANGED
@@ -63,11 +63,6 @@ Find our benchmark script for one model [here](https://github.com/ml-energy/lead
63
  We randomly sampled around 3000 prompts from the [cleaned ShareGPT dataset](https://huggingface.co/datasets/anon8231489123/ShareGPT_Vicuna_unfiltered).
64
  See [here](https://github.com/ml-energy/leaderboard/tree/master/sharegpt) for more detail on how we created the benchmark dataset.
65
 
66
- We used identical system prompts for all models (while respecting their own *role* tokens):
67
- ```
68
- A chat between a human user (prompter) and an artificial intelligence (AI) assistant. The assistant gives helpful, detailed, and polite answers to the user's questions.
69
- ```
70
-
71
  ## NLP evaluation metrics
72
 
73
  - `arc`: [AI2 Reasoning Challenge](https://allenai.org/data/arc)'s `challenge` dataset, measures capability to do grade-school level question answering, 25 shot
 
63
  We randomly sampled around 3000 prompts from the [cleaned ShareGPT dataset](https://huggingface.co/datasets/anon8231489123/ShareGPT_Vicuna_unfiltered).
64
  See [here](https://github.com/ml-energy/leaderboard/tree/master/sharegpt) for more detail on how we created the benchmark dataset.
65
 
 
 
 
 
 
66
  ## NLP evaluation metrics
67
 
68
  - `arc`: [AI2 Reasoning Challenge](https://allenai.org/data/arc)'s `challenge` dataset, measures capability to do grade-school level question answering, 25 shot