Some suggestions for evaluation priority voting mechanism

#801
by zhiminy - opened

image.png
Introducing a community-driven voting system to prioritize model evaluations is an innovative approach to managing resource constraints and budgets effectively :) Thanks for your efforts!

However, without a mechanism to periodically increase the priority of less popular models, there is a risk that some models might never be evaluated, especially considering the high volume of daily submissions.

Open LLM Leaderboard org

Hi!
Thanks for your interest in the leaderboard!

  1. This is precisely the point, though - we are compute constrained and needed to find a fair way to evaluate first the models most relevant for the community, so some models might indeed be evaluated much later than others if they are less important.
  2. The model dropdown can act as a search bar, so I'm not sure what else you would want, can you specify?

Hi!
Thanks for your interest in the leaderboard!

  1. This is precisely the point, though - we are compute constrained and needed to find a fair way to evaluate first the models most relevant for the community, so some models might indeed be evaluated much later than others if they are less important.
  2. The model dropdown can act as a search bar, so I'm not sure what else you would want, can you specify?

Hey @clefourrier
First of all, thank you for your hard work.

I understand the constraints and the need to prioritize models that are most relevant to the community. I appreciate your efforts to ensure fairness in the evaluation process.

I have a suggestion that might help streamline things: would it be possible to offer a paid option for model evaluations? This way, those who are less concerned with votes or popularity and more eager to get their models evaluated quickly could opt for this route.

It could also help support the resources needed for running the evaluations.

Hi!
Thanks for your interest in the leaderboard!

  1. This is precisely the point, though - we are compute constrained and needed to find a fair way to evaluate first the models most relevant for the community, so some models might indeed be evaluated much later than others if they are less important.
  2. The model dropdown can act as a search bar, so I'm not sure what else you would want, can you specify?

Hey @clefourrier
First of all, thank you for your hard work.

I understand the constraints and the need to prioritize models that are most relevant to the community. I appreciate your efforts to ensure fairness in the evaluation process.

I have a suggestion that might help streamline things: would it be possible to offer a paid option for model evaluations? This way, those who are less concerned with votes or popularity and more eager to get their models evaluated quickly could opt for this route.

It could also help support the resources needed for running the evaluations.

Either highly voted or paid models are prioritized for evaluation, this is quite a brilliant idea tbh.

Open LLM Leaderboard org

At the moment, we have no way to do that easily, but we've been thinking about something using user tokens + inference endpoints.
However, to give you an order of magnitude, evaluating a 7B takes at the moment 2 to 3h on 8H100-80G GPUs, for a 70B it's an order of magnitude more. It would be quite a budget ^^

Sign up or log in to comment