Behnamm commited on
Commit
ae06c37
1 Parent(s): ba78a38

Update src/about.py

Browse files
Files changed (1) hide show
  1. src/about.py +2 -2
src/about.py CHANGED
@@ -37,7 +37,7 @@ TITLE = f"""
37
  INTRODUCTION_TEXT = """
38
  Persian LLM Leaderboard is designed to be a challenging benchmark and provide a reliable evaluation of LLMs in Persian Language.
39
 
40
- Note: This is a demo version of the leaderboard. Two new benchmarks are introduced: *PeKA* and *PersBETS*, challenging the native knowledge of the models along with
41
  linguistic skills and their level of bias, ethics, and trustworthiness. **These datasets are not yet public, but they will be uploaded onto huggingface along with a detailed paper
42
  explaining the data and performance of relevent models.**
43
 
@@ -59,7 +59,7 @@ This benchmark can also be used by multilingual researchers to measure how well
59
  We use our own framework to evaluate the models on the following benchmarks (TO BE RELEASED SOON).
60
  ### Tasks
61
  - PeKA: Persian Knowledge Assesment (0-shot) - a set of multiple-choice questions that tests the level of native knowledge in Persian language in more 15 domains and categories: From art to history and geography, cinema, tv, sports, law and medicine, and much more.
62
- - PersBETS: Persian Bias Ethics Toxicity and Skills (0-shot) - a test of model's capability in linguistic skills such as Grammar and Praphrasing, and also questions examining the bias, ethics, and toxicity of the model.
63
  - <a href="https://arxiv.org/abs/2404.06644" target="_blank"> Khayyam Challenge (Persian MMLU) </a> (0-shot) - comprising 20,805 four-choice questions (of which we use 20,776, removing questions that are longer than 200 words) sourced from 38 diverse tasks extracted from Persian examinations, spanning a wide spectrum of subjects, complexities, and ages
64
  - <a href="https://arxiv.org/abs/2012.06154" target="_blank"> ParsiNLU MCQA </a> (0-shot) - a series of multiple-choice questions in domains of *literature*, *math & logic*, and *common knowledge*.
65
  - <a href="https://arxiv.org/abs/2012.06154" target="_blank"> ParsiNLU NLI </a> (max[0,3,5,10]-shot) - a 3-way classification to determine whether a hypothesis sentence entails, contradicts, or is neutral with respect to a given premise sentence.
 
37
  INTRODUCTION_TEXT = """
38
  Persian LLM Leaderboard is designed to be a challenging benchmark and provide a reliable evaluation of LLMs in Persian Language.
39
 
40
+ Note: This is a demo version of the leaderboard. Two new benchmarks are introduced: *PeKA* and *PK-BETS*, challenging the native knowledge of the models along with
41
  linguistic skills and their level of bias, ethics, and trustworthiness. **These datasets are not yet public, but they will be uploaded onto huggingface along with a detailed paper
42
  explaining the data and performance of relevent models.**
43
 
 
59
  We use our own framework to evaluate the models on the following benchmarks (TO BE RELEASED SOON).
60
  ### Tasks
61
  - PeKA: Persian Knowledge Assesment (0-shot) - a set of multiple-choice questions that tests the level of native knowledge in Persian language in more 15 domains and categories: From art to history and geography, cinema, tv, sports, law and medicine, and much more.
62
+ - PK-BETS: Persian Bias Ethics Toxicity and Skills (0-shot) - a test of model's knowledge in Persian and its capability in linguistic skills such as Grammar and Praphrasing, and also questions examining the bias, ethics, and toxicity of the model.
63
  - <a href="https://arxiv.org/abs/2404.06644" target="_blank"> Khayyam Challenge (Persian MMLU) </a> (0-shot) - comprising 20,805 four-choice questions (of which we use 20,776, removing questions that are longer than 200 words) sourced from 38 diverse tasks extracted from Persian examinations, spanning a wide spectrum of subjects, complexities, and ages
64
  - <a href="https://arxiv.org/abs/2012.06154" target="_blank"> ParsiNLU MCQA </a> (0-shot) - a series of multiple-choice questions in domains of *literature*, *math & logic*, and *common knowledge*.
65
  - <a href="https://arxiv.org/abs/2012.06154" target="_blank"> ParsiNLU NLI </a> (max[0,3,5,10]-shot) - a 3-way classification to determine whether a hypothesis sentence entails, contradicts, or is neutral with respect to a given premise sentence.