HumanEval是由OpenAI在2021 […]
GSM8K(Grade School Mat […]
MMLU(Massive Multitask […]
模型能力评估基准是指用于系统衡量人工智能模型 […]
人类评估(Human Evaluation) […]
CIDEr分数(Consensus-base […]
METEOR分数(Metric for Ev […]
ROUGE分数(Recall-Oriente […]
BLEU分数(Bilingual Evalu […]
Perplexity(困惑度)是自然语言处理 […]