Beyond the Answers: Reviewing the Rationality of Multiple Choice Question Answering for the Evaluation of Large Language Models
- URL: http://arxiv.org/abs/2402.01349v2
- Date: Thu, 30 May 2024 01:57:14 GMT
- Title: Beyond the Answers: Reviewing the Rationality of Multiple Choice Question Answering for the Evaluation of Large Language Models
- Authors: Haochun Wang, Sendong Zhao, Zewen Qiang, Nuwa Xi, Bing Qin, Ting Liu,
- Abstract summary: This study investigates the limitations of Multiple Choice Question Answering (MCQA) as an evaluation method for Large Language Models (LLMs)
We propose a dataset augmenting method for Multiple-Choice Questions (MCQs), MCQA+, that can more accurately reflect the performance of the model.
- Score: 29.202758753639078
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In the field of natural language processing (NLP), Large Language Models (LLMs) have precipitated a paradigm shift, markedly enhancing performance in natural language generation tasks. Despite these advancements, the comprehensive evaluation of LLMs remains an inevitable challenge for the community. Recently, the utilization of Multiple Choice Question Answering (MCQA) as a benchmark for LLMs has gained considerable traction. This study first investigates the limitations of MCQA as an evaluation method for LLMs and then analyzes the fundamental reason for the limitations of MCQA, that while LLMs may select the correct answers, it is possible that they also recognize other wrong options as correct. Finally, we propose a dataset augmenting method for Multiple-Choice Questions (MCQs), MCQA+, that can more accurately reflect the performance of the model, which underscores the need for more robust evaluation mechanisms in assessing the performance of LLMs.
Related papers
Err
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.