Leveraging Large Language Models for Multiple Choice Question Answering
- URL: http://arxiv.org/abs/2210.12353v1
- Date: Sat, 22 Oct 2022 05:04:54 GMT
- Title: Leveraging Large Language Models for Multiple Choice Question Answering
- Authors: Joshua Robinson, Christopher Michael Rytting, David Wingate
- Abstract summary: We show that a model with high MCSB ability performs much better with the natural approach than with the traditional approach.
We show that a model with high MCSB ability performs much better with the natural approach than with the traditional approach.
- Score: 6.198523595657983
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: While large language models (LLMs) like GPT-3 have achieved impressive
results on multiple choice question answering (MCQA) tasks in the zero, one,
and few-shot settings, they generally lag behind the MCQA state of the art
(SOTA). MCQA tasks have traditionally been presented to LLMs like cloze tasks.
An LLM is conditioned on a question (without the associated answer options) and
its chosen option is the one assigned the highest probability after
normalization (for length, etc.). A more natural prompting approach is to
present the question and answer options to the LLM jointly and have it output
the symbol (e.g., "A") associated with its chosen answer option. This approach
allows the model to explicitly compare answer options, reduces computational
costs, and mitigates the effects of tokenization scheme and answer option
representations on answer selection. For the natural approach to be effective
the LLM it is used with must be able to associate answer options with the
symbols that represent them. The LLM needs what we term multiple choice symbol
binding (MCSB) ability. This ability varies greatly by model. We show that a
model with high MCSB ability performs much better with the natural approach
than with the traditional approach across 20 diverse datasets and largely
closes the gap with the SOTA, suggesting that the MCQA ability of LLMs has been
previously underestimated.
Related papers
- Differentiating Choices via Commonality for Multiple-Choice Question Answering [54.04315943420376]
Multiple-choice question answering can provide valuable clues for choosing the right answer.
Existing models often rank each choice separately, overlooking the context provided by other choices.
We propose a novel model by differentiating choices through identifying and eliminating their commonality, called DCQA.
arXiv Detail & Related papers (2024-08-21T12:05:21Z) - Answer, Assemble, Ace: Understanding How Transformers Answer Multiple Choice Questions [103.20281438405111]
Multiple-choice question answering (MCQA) is a key competence of performant transformer language models.
We employ vocabulary projection and activation patching methods to localize key hidden states that encode relevant information.
We show that prediction of a specific answer symbol is causally attributed to a single middle layer, and specifically its multi-head self-attention mechanism.
arXiv Detail & Related papers (2024-07-21T00:10:23Z) - Is Your Large Language Model Knowledgeable or a Choices-Only Cheater? [16.384333600053342]
Recent work shows that large language models (LLMs) can answer multiple-choice questions using only the choices.
We use a contrast set that probes if LLMs over-rely on choices-only shortcuts in MCQA.
After validating our contrast set, we test 12 LLMs, finding that these models do not exhibit reliance on choice-only shortcuts when given both the question and choices.
arXiv Detail & Related papers (2024-07-02T07:06:53Z) - Multi-LLM QA with Embodied Exploration [55.581423861790945]
We investigate the use of Multi-Embodied LLM Explorers (MELE) for question-answering in an unknown environment.
Multiple LLM-based agents independently explore and then answer queries about a household environment.
We analyze different aggregation methods to generate a single, final answer for each query.
arXiv Detail & Related papers (2024-06-16T12:46:40Z) - Open-LLM-Leaderboard: From Multi-choice to Open-style Questions for LLMs Evaluation, Benchmark, and Arena [23.264049073539663]
Multiple-choice questions (MCQ) are frequently used to assess large language models (LLMs)
LLMs may inherently favor certain answer choice IDs, such as A/B/C/D, due to inherent biases of priori unbalanced probabilities.
This work aims to tackle these significant difficulties, and establish a new LLM evaluation benchmark through entirely open-style questions.
arXiv Detail & Related papers (2024-06-11T17:59:47Z) - Artifacts or Abduction: How Do LLMs Answer Multiple-Choice Questions Without the Question? [15.308093827770474]
We probe if large language models (LLMs) can perform multiple-choice question answering (MCQA) with choices-only prompts.
This prompt bests a majority baseline in 11/12 cases, with up to 0.33 accuracy gain.
We conduct an in-depth, black-box analysis on memorization, choice dynamics, and question inference.
arXiv Detail & Related papers (2024-02-19T19:38:58Z) - Beyond the Answers: Reviewing the Rationality of Multiple Choice Question Answering for the Evaluation of Large Language Models [29.202758753639078]
This study investigates the limitations of Multiple Choice Question Answering (MCQA) as an evaluation method for Large Language Models (LLMs)
We propose a dataset augmenting method for Multiple-Choice Questions (MCQs), MCQA+, that can more accurately reflect the performance of the model.
arXiv Detail & Related papers (2024-02-02T12:07:00Z) - Enhancing Answer Selection in Community Question Answering with
Pre-trained and Large Language Models [0.9065034043031668]
We first propose the Question-Answer cross attention networks (QAN) with pre-trained models for answer selection.
We then utilize large language model (LLM) to perform answer selection with knowledge augmentation.
Experiments show that the QAN model state-of-the-art performance on two datasets, SemEval2015 and SemEval 2017.
arXiv Detail & Related papers (2023-11-29T10:24:50Z) - Large Language Models Are Not Robust Multiple Choice Selectors [117.72712117510953]
Multiple choice questions (MCQs) serve as a common yet important task format in the evaluation of large language models (LLMs)
This work shows that modern LLMs are vulnerable to option position changes due to their inherent "selection bias"
We propose a label-free, inference-time debiasing method, called PriDe, which separates the model's prior bias for option IDs from the overall prediction distribution.
arXiv Detail & Related papers (2023-09-07T17:44:56Z) - LaGR-SEQ: Language-Guided Reinforcement Learning with Sample-Efficient
Querying [71.86163159193327]
Large language models (LLMs) have recently demonstrated their impressive ability to provide context-aware responses via text.
This ability could potentially be used to predict plausible solutions in sequential decision making tasks pertaining to pattern completion.
We introduce LaGR, which uses this predictive ability of LLMs to propose solutions to tasks that have been partially completed by a primary reinforcement learning (RL) agent.
arXiv Detail & Related papers (2023-08-21T02:07:35Z) - Question Answering as Programming for Solving Time-Sensitive Questions [84.07553016489769]
Question answering plays a pivotal role in human daily life because it involves our acquisition of knowledge about the world.
Recently, Large Language Models (LLMs) have shown remarkable intelligence in question answering.
This can be attributed to the LLMs' inability to perform rigorous reasoning based on surface-level text semantics.
We propose a novel approach where we reframe the $textbfQ$uestion $textbfA$rogrogering task.
arXiv Detail & Related papers (2023-05-23T16:35:16Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.