MMLU-Pro benchmark?

#25
by lightenup - opened

Are there somewhere MMLU-Pro benchmark results including the numbers for the individual categories? I'd be interested in a benchmark of the officially released model (not quantized).

Feel free to check out the performance of gemma-2-9b and gemma-2-9b-it on MMLU-Pro at https://huggingface.co/spaces/TIGER-Lab/MMLU-Pro.

Thanks!

The prompt itself and all other inference parameters are very important for the MMLU-Pro score. Hence it would be great, if we somewhere see what prompt and inference parameters were used.

Also I am even more eager to see the gemma-2-27b MMLU-Pro results, as the gemma-2-9b already scores so high!

Sign up or log in to comment