Update README.md

#51

by stellaathena - opened Jul 18, 2022

base: refs/heads/main

←

from: refs/pr/51

Discussion Files changed

-1796

stellaathena

BigScience Workshop org Jul 18, 2022

No description provided.

Update README.md5985a179

thomwolf

BigScience Workshop org Jul 18, 2022

So you think we should remove them for now @stellaathena ?

stellaathena

BigScience Workshop org Jul 18, 2022

Yes, that was the conclusion we reached on today’s Eval WG call.

Muennighoff

BigScience Workshop org Jul 19, 2022

I don't think we should remove them.
I added a disclaimer above the results that these are not final, as your working group is working on visualizations and different ways to represent the data. As far as I followed your working groups call that was an acceptable solution. So I'd suggest to replace the table in the PR that adds a better visualization of the evaluation results 😊

stellaathena

BigScience Workshop org Jul 19, 2022

@Muennighoff We’ve tried really hard to be polite, but since that’s not working I’ll try being blunt instead: these evaluation results should have never been released. They are untrustworthy, unverified, and actively misleading. They have already caused substantial confusion, and will continue to do so. The evaluation WG in no way supports them, and their release is a violation of BigScience’s guiding principles.

Additionally, the disclaimer you added (“WARNING: These are intermediate results”) is false. The problem is not that these results were done on intermediate checkpoints. A more appropriate disclaimer would be:

WARNING: these evaluation results were carried out by people unfamiliar with the evaluation code. Some of them are known to be incorrect, and the rest are largely invalidated. They were released without the approval or consent of the Evaluation WG. The Evaluation WG disowns them and wishes that they had never been released in the first place.

TimeRobber

BigScience Workshop org Jul 19, 2022

•

edited Jul 19, 2022

Hey @stellaathena ! I don't think @Muennighoff meant any harm at all as he wasn't there at the end of the meeting. I'm okay with removing them and letting you guys handle the evaluation. I think we should keep the original dump though (I think some of the ongoing work is being done on that) and the human eval evaluation done by @loubnabnl on a seperate codebase. Does that work for you?

Nit: They did run on the final checkpoint.

stellaathena

BigScience Workshop org Jul 20, 2022

I spoke with @TimeRobber one-on-one and we agreed to go ahead and remove the evaluation results. I'm not sure who has the permissions to merge this PR, but please do so ASAP

TimeRobber

BigScience Workshop org Jul 20, 2022

Still think we should keep human eval and training/validation loss/perplexity. If you can update the PR I can merge it.

stellaathena changed pull request status to closed Aug 30, 2022

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment