Add common benchmark like MMLU/HumanEval

#12

by Amadeusystem - opened May 29

Discussion

Amadeusystem

May 29

•

edited May 29

MMLU/HumanEval/TriviaQA scores compare to original, so we can evaluate the loss/gain on common baseline.

nctu6

TAIDE org May 30

Hi,

I'm not sure what the problem you mentioned is.
Models can be freely downloaded and tested with any tool.

Best regards.

Amadeusystem

May 30

what i meant is that yes we can run these benchmark myself, but it is recommended and very common in the community that give the benchmark on these common baseline, so we can evaluate it faster and consider to try or not to try the model. Its for your own good, common and more clear for the model.

nctu6

TAIDE org Jun 3

Hi,

Thank you very much for your valuable suggestions.
We have noted them and forwarded them to the manager.
If there is anything else you would like to add, please let us know.

Best regards,

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment