Spaces:

open-llm-leaderboard
/

open_llm_leaderboard

Running on CPU Upgrade

App Files Files Community

916

The problem about the overall score of BBH and GPQA datasets

#842

by Amigozyq - opened Jul 15

Discussion

Amigozyq

Jul 15

Hi! Thank you very much for your helpful and outstanding work!

The BBH dataset and the GPQA dataset both have multiple subsets, but on the open-llm-leaderboard, what is displayed is the overall score of each model on BBH and GPQA. I wonder how these overall scores are obtained? Are they simply the average of the scores the model achieves on each subset? Or they are the score of the concat of all subsets?

Thank you very much!

alozowski

Open LLM Leaderboard org Jul 16

Hi @Amigozyq ,

Here is the new page about Scores Normalization in our documentation, I think it will be helpful

I close this discussion, please, open a new one if you have any questions!

alozowski changed discussion status to closed Jul 16

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment