scottsuk0306's picture
Init
b1b6ed6
raw
history blame contribute delete
No virus
2.1 kB
LOGO = '<img src="https://raw.githubusercontent.com/prometheus-eval/leaderboard/main/logo.png">'
TITLE = """<h1 align="center" id="space-title">πŸ€— BiGGen-Bench Leaderboard πŸ‹οΈ</h1>"""
BGB_LOGO = '<img src="https://raw.githubusercontent.com/prometheus-eval/leaderboard/main/logo.png" alt="Logo" style="width: 30%; display: block; margin: auto;">'
BGB_TITLE = """<h1 align="center">BiGGen-Bench Leaderboard</h1>"""
ABOUT = """
## πŸ“ About
### BiGGen-Bench Leaderboard
Welcome to the 🌟 BiGGen-Bench Leaderboard πŸš€, a dedicated benchmarking platform designed to evaluate the nuanced capabilities of Generative Language Models (GLMs) across a variety of complex and diverse tasks. Leveraging the refined methodologies of [BiGGen-Bench](https://github.com/prometheus-eval/prometheus-eval), our leaderboard offers a comprehensive assessment framework that mirrors human-like discernment and precision in evaluating language models.
#### Evaluation Details
- **Evaluation Scope**: Covers nine key capabilities of GLMs across 77 tasks, with 765 unique instances tailored to test specific aspects of model performance.
- **Scoring System**: Utilizes a detailed scoring rubric from 1 to 5, reflecting a range of outcomes based on instance-specific criteria closely aligned with the nuanced requirements of each task.
- **Hardware and Setup**: Benchmarks are conducted using a controlled setup to ensure consistent and fair comparison across different models.
- **Transparency and Openness**: All codes, data, and detailed evaluation results are publicly available to foster transparency and enable community-driven enhancements and verifications.
#### Benchmarking Script
All benchmarks are executed using the provided [code](https://github.com/prometheus-eval/prometheus-eval/blob/main/BiGGen-Bench) within the BiGGen-Bench repository. This script ensures that all models are evaluated under identical conditions, guaranteeing reliability and reproducibility of results.
"""
CITATION_BUTTON_LABEL = "Copy the following snippet to cite these results."
CITATION_BUTTON = r"""TBA
"""