persian_llm_leaderboard

Running

App Files Files Community

Behnamm commited on 25 days ago

Commit

0aced5f

•

1 Parent(s): 37e3c44

Update src/about.py

Browse files

Files changed (1) hide show

src/about.py +13 -3

src/about.py CHANGED Viewed

@@ -37,7 +37,7 @@ TITLE = f"""
 INTRODUCTION_TEXT = """
 Persian LLM Leaderboard is designed to be a challenging benchmark and provide a reliable evaluation of LLMs in Persian Language.
-Note: This is a demo version of the leaderboard. We introduce two new benchmarks *PeKA* and *PersBETS* that challenge the native knowledge of the models along with
 linguistic skills and their level of bias, ethics, and trustworthiness. **These datasets are not yet public, but they will be uploaded onto huggingface along with a detailed paper
 explaining the data and performance of relevent models.**
@@ -54,7 +54,13 @@ To reproduce our results, here is the commands you can run:
 """
 EVALUATION_QUEUE_TEXT = """
-## Some good practices before submitting a model
 ### 1) Make sure you can load your model and tokenizer using AutoClasses:
 ```python
@@ -79,7 +85,11 @@ When we add extra information about models to the leaderboard, it will be automa
 ## In case of model failure
 If your model is displayed in the `FAILED` category, its execution stopped.
 Make sure you have followed the above steps first.
-If everything is done, check you can launch the EleutherAIHarness on your model locally, using the above command without modifications (you can add `--limit` to limit the number of examples per task).
 """
 CITATION_BUTTON_LABEL = "Copy the following snippet to cite these results"

 INTRODUCTION_TEXT = """
 Persian LLM Leaderboard is designed to be a challenging benchmark and provide a reliable evaluation of LLMs in Persian Language.
+Note: This is a demo version of the leaderboard. Two new benchmarks are introduced: *PeKA* and *PersBETS*, challenging the native knowledge of the models along with
 linguistic skills and their level of bias, ethics, and trustworthiness. **These datasets are not yet public, but they will be uploaded onto huggingface along with a detailed paper
 explaining the data and performance of relevent models.**
 """
 EVALUATION_QUEUE_TEXT = """
+Right now, the models added **are not automatically evaluated**. We may support automatic evaluation in the future on our own clusters.
+An evaluation framework will be available in the future to help reproduce the results.
+## Don't forget to read the FAQ and the About tabs for more information!
+## First steps before submitting a model
 ### 1) Make sure you can load your model and tokenizer using AutoClasses:
 ```python
 ## In case of model failure
 If your model is displayed in the `FAILED` category, its execution stopped.
 Make sure you have followed the above steps first.
+### 5) Select the correct precision
+Not all models are converted properly from `float16` to `bfloat16`, and selecting the wrong precision can sometimes cause evaluation error (as loading a `bf16` model in `fp16` can sometimes generate NaNs, depending on the weight range).
 """
 CITATION_BUTTON_LABEL = "Copy the following snippet to cite these results"