open-llm-leaderboard/open_llm_leaderboard · Re-Evaluate models with old Llama 3 generation config

22 days ago

•

Hello,

some models like Neural-Daredevil still have the old generation file that specifies 120001 as the EOS token, (end_of_text), when it should be 120009 (<|eot_id|>). For Llama 3 Instruct, this is set correctly (see here: https://huggingface.co/meta-llama/Meta-Llama-3-8B-Instruct/blob/main/generation_config.json#L3)

The old generation_config.json models like Neural Daredevil use basically leads to the model being incapable stopping itself during evaluation which results in unexpectedly low scores:

Here's an example for IFEval.

For models like Neural-Daredevil-abliterated the generation_config.json has to be replaced by the one I've linked above for proper evaluation. NDD has received special attention by me because I really like it, so I have opened a PR that fixes this (https://huggingface.co/mlabonne/NeuralDaredevil-8B-abliterated/discussions/8/files) but there might be more L3 models out there with the old generation file.

clefourrier

Open LLM Leaderboard org 22 days ago

Hi @Dampfinchen ,
Once this model is fixed with the new token management, feel free to resubmit it (and select the new commit) and they'll get re-evaluated.
However, it would be good if people could be careful with their submissions as it's costly to re-run badly submitted models.

clefourrier changed discussion status to closed 22 days ago

Dampfinchen

21 days ago

Hello @clefourrier

mlabonne/NeuralDaredevil-8B-abliterated

The model has been fixed. Would you be so kind to flush the old test result so I can resubmit it? As I'm not the model creator, I cannot create a new commit.

Thank you!

clefourrier

Open LLM Leaderboard org 21 days ago

If it's been merged, you can simply take the hash of the merge commit and submit with it.
(We don't delete previous run results.)

mlabonne

21 days ago

Good to know, thanks @clefourrier and @Dampfinchen