Jae-Won Chung commited on
Commit
0787166
1 Parent(s): 069d87a

Add a section in Limitations

Browse files
Files changed (1) hide show
  1. LEADERBOARD.md +5 -0
LEADERBOARD.md CHANGED
@@ -61,6 +61,11 @@ See [here](https://github.com/ml-energy/leaderboard/tree/master/sharegpt) for mo
61
  - `hellaswag`: [HellaSwag dataset](https://allenai.org/data/hellaswag), measuring grounded commonsense, 10 shot
62
  - `truthfulqa`: [TruthfulQA dataset](https://arxiv.org/abs/2109.07958), measuring truthfulness against questions that elicit common falsehoods, 0 shot
63
 
 
 
 
 
 
64
  ## Upcoming
65
 
66
  - Within the Summer, we'll add an LLM Arena for energy consumption!
 
61
  - `hellaswag`: [HellaSwag dataset](https://allenai.org/data/hellaswag), measuring grounded commonsense, 10 shot
62
  - `truthfulqa`: [TruthfulQA dataset](https://arxiv.org/abs/2109.07958), measuring truthfulness against questions that elicit common falsehoods, 0 shot
63
 
64
+ ## Limitations
65
+
66
+ Currently, inference is run with basically bare PyTorch with batch size 1, which is unrealistic assuming a production serving scenario.
67
+ Hence, absolute latency, throughput, and energy numbers should not be used to estimate figures in real production settings, while relative comparison makes some sense.
68
+
69
  ## Upcoming
70
 
71
  - Within the Summer, we'll add an LLM Arena for energy consumption!