Harness Evaluation

#56
by VityaVitalich - opened

Dear maintainers,
Thank you for your work

I would like to evaluate your model with LM Evaluation Harness framework, however i struggle with running it as usual. Is there any recipe for this?

Knowledge Engineering Group (KEG) & Data Mining at Tsinghua University org

你遇到了什么困难呢,因为我没有用过这个框架

This framework is one of the most popular for evaluating LLM on various benchmarks (here is the link https://github.com/EleutherAI/lm-evaluation-harness). However it does not work well with GLM models, as soon as they are not loaded and infered properly. Models are seq2seq, however not detected as such model by the framework. And later it should be treated as CausalLM to work properly.

In case any of the users will face the same problem here is the fixes I've made to make this framework properly evaluate GLM. Fixes are imperfect, but I hope would be helpful. https://github.com/VityaVitalich/LLM_Compression/blob/main/lm-evaluation-harness/lm_eval/models/huggingface.py

Sign up or log in to comment