Updated on 2025/07/26: Since the release of C-Eval in 2023, we kept the test set private to avoid leakage and users needed to upload the prediction results to obtain the test numbers. Now we have decided to stop maintaining this leaderboard and release the test set to the public, so that the users can directly evaluate on the C-Eval test set more conveniently. You can directly download the C-Eval test set from Huggingface. We won't update the leaderboard below anymore (and we will remove the leaderboard section from the website sometime in the future). We appreciate you using C-Eval in your evaluation :)
Results for different subjects and the average test results are shown below. The results are from either zero-shot or few-shot prompting (Model details including prompting format can be viewed by clicking into each model).
(Note: * indicates that the model was evaluated by the C-Eval team, while other results are obtained through users' submitted predictions.)
# | Model | Creator | Access | Submission Date | Avg | Avg(Hard) | STEM | Social Science | Humanities | Others |
# | Model | Creator | Access | Submission Date | Avg | Avg(Hard) | STEM | Social Science | Humanities | Others |