EvalPlus

university

https://evalplus.github.io/

evalplus

AI & ML interests

Evaluation of Languages Models on Code.

Organization Card

Community About org cards

EvalPlus： Rigorous Evaluation of LLMs for Code Generation

💻 GitHub Repo: evalplus/evalplus
🏆 Leader Board: evalplus.github.io
📜 NeurIPS Paper: OpenReview
🐍 Python Package: PyPI

@inproceedings{evalplus,
  title = {Is Your Code Generated by Chat{GPT} Really Correct? Rigorous Evaluation of Large Language Models for Code Generation},
  author = {Liu, Jiawei and Xia, Chunqiu Steven and Wang, Yuyao and Zhang, Lingming},
  booktitle = {Thirty-seventh Conference on Neural Information Processing Systems},
  year = {2023},
  url = {https://openreview.net/forum?id=1qvx610Cu7},
}

models

None public yet

datasets 2

evalplus/humanevalplus

Viewer • Updated May 1 • 164 • 39.1k • 5

evalplus/mbppplus

Viewer • Updated Apr 17 • 378 • 48k • 6