Cancel running float32 model (kevinpro/Hydra-LLaMA3-8B-0513-preview)

#6
by CombinHorizon - opened

https://huggingface.co/datasets/OALL/requests/blob/main/kevinpro/Hydra-LLaMA3-8B-0513-preview_eval_request_False_float32_Original.json

has been running for more than an month, (since 2024-05may-31), and due to the run being set to a higher precision, it has appeared to be stuck in a running state.
Also, the owner (not me), appears to have made that model unavailable, it's not accessible anymore, thus, it seems to make sense to cancel this run, to save on compute and memory bandwidth resources. The leaderboard is also currently running test in float16 and bfloat16, thus the loss is not as much, but is this significant?, is it worth keeping?)
Has anyone else noticed that some models submitted just prior/around the same time are also stuck in the running state? Are they using the same cluster?

(see https://huggingface.co/datasets/OALL/requests/discussions/14)
(see https://huggingface.co/datasets/OALL/requests/discussions/13)

(edit: spelling and grammar)

CombinHorizon changed discussion title from Cancel running model to Cancel certain running float32 model
CombinHorizon changed discussion title from Cancel certain running float32 model to Cancel running float32 model (kevinpro/Hydra-LLaMA3-8B-0513-preview)
Open Arabic LLM Leaderboard org

Dear @CombinHorizon , Thank you so much for your valuable comments and insights.
We will take care of removing any stuck models from the leaderboard, starting with the ones you mentioned.
Also the original idea was to provide the community with the possibility to submit small models (<7B) in full precision (f32) and in other precisions as well, it could help researchers study the effect of changing the precision on arabic models ... unfortunately we don't see that the community uses the full precision tag that way so we are now in discussion to remove it permanently in the upcoming days.
This discussion will remain open, so please feel free to use it to report any weird behaviour you may have noticed and thank you again for your valuable inputs ๐Ÿค—

alielfilali01 changed discussion status to closed

another frozen running float32 model, it might make sense to cancel this, and re-run with bfloat16 (bfloat16 is the closest without overflow)
https://huggingface.co/datasets/OALL/requests/blob/main/Ashmal/MBZUAI-ORYX-new2_eval_request_False_float32_Original.json (submitted_time": "2024-06-11T11)

the other models that ran around 2023-05may-30-31 have finished running except for the deleted model below

https://huggingface.co/datasets/OALL/requests/blob/main/kevinpro/Hydra-LLaMA3-8B-0531-preview_eval_request_False_float16_Original.json (submitted_time": "2024-05-31T18)
https://huggingface.co/datasets/OALL/requests/blob/main/kevinpro/Hydra-LLaMA3-8B-0531-preview_eval_request_False_bfloat16_Original.json (submitted_time": "2024-05-31T16)

given past history, it does not seem like a good idea to run the following in float32 (currently in pending), maybe you might want to cancel and rerun them in standard precision (bfloat16) (i didn't submit these, in case you were wondering)
https://huggingface.co/datasets/OALL/requests/blob/main/sambanovasystems/SambaLingo-Arabic-Chat-70B_eval_request_False_float32_Original.json (natively bfloat16, and bfloat16 already sumbitted)
https://huggingface.co/datasets/OALL/requests/blob/main/sambanovasystems/SambaLingo-Arabic-Base-70B_eval_request_False_float32_Original.json (natively bfloat16)
https://huggingface.co/datasets/OALL/requests/blob/main/CohereForAI/aya-101_eval_request_False_float32_Original.json (13B float32)

The difference between float16 and bfloat16 is typically few/several tenths of a percentage point, is it worth it?, esp in a field where newer models are still coming out and standards are changing & improving?), maybe a float32 test on an even smaller model is okay, provided it does not clog the leaderboard as it did previously

Sign up or log in to comment