CodeEditorBench: Evaluating Code Editing Capability of Large Language Models Paper • 2404.03543 • Published Apr 4 • 15
BigCodeBench: Benchmarking Code Generation with Diverse Function Calls and Complex Instructions Paper • 2406.15877 • Published about 1 month ago • 43
SciCode: A Research Coding Benchmark Curated by Scientists Paper • 2407.13168 • Published 5 days ago • 7