Re-Evaluate models with old Llama 3 generation config

#815
by Dampfinchen - opened

Hello,

some models like Neural-Daredevil still have the old generation file that specifies 120001 as the EOS token, (end_of_text), when it should be 120009 (<|eot_id|>). For Llama 3 Instruct, this is set correctly (see here: https://huggingface.co/meta-llama/Meta-Llama-3-8B-Instruct/blob/main/generation_config.json#L3)

The old generation_config.json models like Neural Daredevil use basically leads to the model being incapable stopping itself during evaluation which results in unexpectedly low scores:

neuraldaredevil.png

Here's an example for IFEval.

For models like Neural-Daredevil-abliterated the generation_config.json has to be replaced by the one I've linked above for proper evaluation. NDD has received special attention by me because I really like it, so I have opened a PR that fixes this (https://huggingface.co/mlabonne/NeuralDaredevil-8B-abliterated/discussions/8/files) but there might be more L3 models out there with the old generation file.

Open LLM Leaderboard org

Hi @Dampfinchen ,
Once this model is fixed with the new token management, feel free to resubmit it (and select the new commit) and they'll get re-evaluated.
However, it would be good if people could be careful with their submissions as it's costly to re-run badly submitted models.

clefourrier changed discussion status to closed

Hello @clefourrier

mlabonne/NeuralDaredevil-8B-abliterated

The model has been fixed. Would you be so kind to flush the old test result so I can resubmit it? As I'm not the model creator, I cannot create a new commit.

Thank you!

Open LLM Leaderboard org

If it's been merged, you can simply take the hash of the merge commit and submit with it.
(We don't delete previous run results.)

Good to know, thanks @clefourrier and @Dampfinchen

Sign up or log in to comment