Behnamm commited on
Commit
0aced5f
1 Parent(s): 37e3c44

Update src/about.py

Browse files
Files changed (1) hide show
  1. src/about.py +13 -3
src/about.py CHANGED
@@ -37,7 +37,7 @@ TITLE = f"""
37
  INTRODUCTION_TEXT = """
38
  Persian LLM Leaderboard is designed to be a challenging benchmark and provide a reliable evaluation of LLMs in Persian Language.
39
 
40
- Note: This is a demo version of the leaderboard. We introduce two new benchmarks *PeKA* and *PersBETS* that challenge the native knowledge of the models along with
41
  linguistic skills and their level of bias, ethics, and trustworthiness. **These datasets are not yet public, but they will be uploaded onto huggingface along with a detailed paper
42
  explaining the data and performance of relevent models.**
43
 
@@ -54,7 +54,13 @@ To reproduce our results, here is the commands you can run:
54
  """
55
 
56
  EVALUATION_QUEUE_TEXT = """
57
- ## Some good practices before submitting a model
 
 
 
 
 
 
58
 
59
  ### 1) Make sure you can load your model and tokenizer using AutoClasses:
60
  ```python
@@ -79,7 +85,11 @@ When we add extra information about models to the leaderboard, it will be automa
79
  ## In case of model failure
80
  If your model is displayed in the `FAILED` category, its execution stopped.
81
  Make sure you have followed the above steps first.
82
- If everything is done, check you can launch the EleutherAIHarness on your model locally, using the above command without modifications (you can add `--limit` to limit the number of examples per task).
 
 
 
 
83
  """
84
 
85
  CITATION_BUTTON_LABEL = "Copy the following snippet to cite these results"
 
37
  INTRODUCTION_TEXT = """
38
  Persian LLM Leaderboard is designed to be a challenging benchmark and provide a reliable evaluation of LLMs in Persian Language.
39
 
40
+ Note: This is a demo version of the leaderboard. Two new benchmarks are introduced: *PeKA* and *PersBETS*, challenging the native knowledge of the models along with
41
  linguistic skills and their level of bias, ethics, and trustworthiness. **These datasets are not yet public, but they will be uploaded onto huggingface along with a detailed paper
42
  explaining the data and performance of relevent models.**
43
 
 
54
  """
55
 
56
  EVALUATION_QUEUE_TEXT = """
57
+
58
+ Right now, the models added **are not automatically evaluated**. We may support automatic evaluation in the future on our own clusters.
59
+ An evaluation framework will be available in the future to help reproduce the results.
60
+
61
+ ## Don't forget to read the FAQ and the About tabs for more information!
62
+
63
+ ## First steps before submitting a model
64
 
65
  ### 1) Make sure you can load your model and tokenizer using AutoClasses:
66
  ```python
 
85
  ## In case of model failure
86
  If your model is displayed in the `FAILED` category, its execution stopped.
87
  Make sure you have followed the above steps first.
88
+
89
+ ### 5) Select the correct precision
90
+ Not all models are converted properly from `float16` to `bfloat16`, and selecting the wrong precision can sometimes cause evaluation error (as loading a `bf16` model in `fp16` can sometimes generate NaNs, depending on the weight range).
91
+
92
+
93
  """
94
 
95
  CITATION_BUTTON_LABEL = "Copy the following snippet to cite these results"