Answer, Assemble, Ace: Understanding How Transformers Answer Multiple Choice Questions
- URL: http://arxiv.org/abs/2407.15018v1
- Date: Sun, 21 Jul 2024 00:10:23 GMT
- Title: Answer, Assemble, Ace: Understanding How Transformers Answer Multiple Choice Questions
- Authors: Sarah Wiegreffe, Oyvind Tafjord, Yonatan Belinkov, Hannaneh Hajishirzi, Ashish Sabharwal,
- Abstract summary: Multiple-choice question answering (MCQA) is a key competence of performant transformer language models.
We employ vocabulary projection and activation patching methods to localize key hidden states that encode relevant information.
We show that prediction of a specific answer symbol is causally attributed to a single middle layer, and specifically its multi-head self-attention mechanism.
- Score: 103.20281438405111
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Multiple-choice question answering (MCQA) is a key competence of performant transformer language models that is tested by mainstream benchmarks. However, recent evidence shows that models can have quite a range of performance, particularly when the task format is diversified slightly (such as by shuffling answer choice order). In this work we ask: how do successful models perform formatted MCQA? We employ vocabulary projection and activation patching methods to localize key hidden states that encode relevant information for predicting the correct answer. We find that prediction of a specific answer symbol is causally attributed to a single middle layer, and specifically its multi-head self-attention mechanism. We show that subsequent layers increase the probability of the predicted answer symbol in vocabulary space, and that this probability increase is associated with a sparse set of attention heads with unique roles. We additionally uncover differences in how different models adjust to alternative symbols. Finally, we demonstrate that a synthetic task can disentangle sources of model error to pinpoint when a model has learned formatted MCQA, and show that an inability to separate answer symbol tokens in vocabulary space is a property of models unable to perform formatted MCQA tasks.
Related papers
- Differentiating Choices via Commonality for Multiple-Choice Question Answering [54.04315943420376]
Multiple-choice question answering can provide valuable clues for choosing the right answer.
Existing models often rank each choice separately, overlooking the context provided by other choices.
We propose a novel model by differentiating choices through identifying and eliminating their commonality, called DCQA.
arXiv Detail & Related papers (2024-08-21T12:05:21Z) - QLSC: A Query Latent Semantic Calibrator for Robust Extractive Question Answering [32.436530949623155]
We propose a unique scaling strategy to capture latent semantic center features of queries.
These features are seamlessly integrated into traditional query and passage embeddings.
Our approach diminishes sensitivity to variations in text format and boosts the model's capability in pinpointing accurate answers.
arXiv Detail & Related papers (2024-04-30T07:34:42Z) - Learn to Explain: Multimodal Reasoning via Thought Chains for Science
Question Answering [124.16250115608604]
We present Science Question Answering (SQA), a new benchmark that consists of 21k multimodal multiple choice questions with a diverse set of science topics and annotations of their answers with corresponding lectures and explanations.
We show that SQA improves the question answering performance by 1.20% in few-shot GPT-3 and 3.99% in fine-tuned UnifiedQA.
Our analysis further shows that language models, similar to humans, benefit from explanations to learn from fewer data and achieve the same performance with just 40% of the data.
arXiv Detail & Related papers (2022-09-20T07:04:24Z) - SwapMix: Diagnosing and Regularizing the Over-Reliance on Visual Context
in Visual Question Answering [20.35687327831644]
We study the robustness of Visual Question Answering (VQA) models from a novel perspective: visual context.
SwapMix perturbs the visual context by swapping features of irrelevant context objects with features from other objects in the dataset.
We train the models with perfect sight and find that the context over-reliance highly depends on the quality of visual representations.
arXiv Detail & Related papers (2022-04-05T15:32:25Z) - Counterfactual Variable Control for Robust and Interpretable Question
Answering [57.25261576239862]
Deep neural network based question answering (QA) models are neither robust nor explainable in many cases.
In this paper, we inspect such spurious "capability" of QA models using causal inference.
We propose a novel approach called Counterfactual Variable Control (CVC) that explicitly mitigates any shortcut correlation.
arXiv Detail & Related papers (2020-10-12T10:09:05Z) - Robust Question Answering Through Sub-part Alignment [53.94003466761305]
We model question answering as an alignment problem.
We train our model on SQuAD v1.1 and test it on several adversarial and out-of-domain datasets.
arXiv Detail & Related papers (2020-04-30T09:10:57Z) - ManyModalQA: Modality Disambiguation and QA over Diverse Inputs [73.93607719921945]
We present a new multimodal question answering challenge, ManyModalQA, in which an agent must answer a question by considering three distinct modalities.
We collect our data by scraping Wikipedia and then utilize crowdsourcing to collect question-answer pairs.
arXiv Detail & Related papers (2020-01-22T14:39:28Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.