Winograd Schema Challe […]
SuperGLUE基准(SuperGLUE […]
GLUE基准(General Languag […]
HumanEval是由OpenAI在2021 […]
GSM8K(Grade School Mat […]
MMLU(Massive Multitask […]
模型能力评估基准是指用于系统衡量人工智能模型 […]
人类评估(Human Evaluation) […]
CIDEr分数(Consensus-base […]
METEOR分数(Metric for Ev […]