Related papers: Answer, Assemble, Ace: Understanding How Transformers Answer Multiple Choice Questions

Answer, Assemble, Ace: Understanding How Transformers Answer Multiple Choice Questions

URL: http://arxiv.org/abs/2407.15018v1
Date: Sun, 21 Jul 2024 00:10:23 GMT
Title: Answer, Assemble, Ace: Understanding How Transformers Answer Multiple Choice Questions
Authors: Sarah Wiegreffe, Oyvind Tafjord, Yonatan Belinkov, Hannaneh Hajishirzi, Ashish Sabharwal,
Abstract summary: Multiple-choice question answering (MCQA) is a key competence of performant transformer language models. We employ vocabulary projection and activation patching methods to localize key hidden states that encode relevant information. We show that prediction of a specific answer symbol is causally attributed to a single middle layer, and specifically its multi-head self-attention mechanism.
Score: 103.20281438405111
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Multiple-choice question answering (MCQA) is a key competence of performant transformer language models that is tested by mainstream benchmarks. However, recent evidence shows that models can have quite a range of performance, particularly when the task format is diversified slightly (such as by shuffling answer choice order). In this work we ask: how do successful models perform formatted MCQA? We employ vocabulary projection and activation patching methods to localize key hidden states that encode relevant information for predicting the correct answer. We find that prediction of a specific answer symbol is causally attributed to a single middle layer, and specifically its multi-head self-attention mechanism. We show that subsequent layers increase the probability of the predicted answer symbol in vocabulary space, and that this probability increase is associated with a sparse set of attention heads with unique roles. We additionally uncover differences in how different models adjust to alternative symbols. Finally, we demonstrate that a synthetic task can disentangle sources of model error to pinpoint when a model has learned formatted MCQA, and show that an inability to separate answer symbol tokens in vocabulary space is a property of models unable to perform formatted MCQA tasks.

Related papers

Differentiating Choices via Commonality for Multiple-Choice Question Answering [54.04315943420376]
Multiple-choice question answering can provide valuable clues for choosing the right answer. Existing models often rank each choice separately, overlooking the context provided by other choices. We propose a novel model by differentiating choices through identifying and eliminating their commonality, called DCQA.
arXiv Detail & Related papers (2024-08-21T12:05:21Z)
QLSC: A Query Latent Semantic Calibrator for Robust Extractive Question Answering [32.436530949623155]
We propose a unique scaling strategy to capture latent semantic center features of queries. These features are seamlessly integrated into traditional query and passage embeddings. Our approach diminishes sensitivity to variations in text format and boosts the model's capability in pinpointing accurate answers.
arXiv Detail & Related papers (2024-04-30T07:34:42Z)
Does Circuit Analysis Interpretability Scale? Evidence from Multiple Choice Capabilities in Chinchilla [6.625597238953314]
We present a case study of circuit analysis in the 70B Chinchilla model. We investigate Chinchilla's capability to identify the correct answer emphlabel given knowledge of the correct answer emphtext We study the correct letter' category of attention heads aiming to understand the semantics of their features, with mixed results.
arXiv Detail & Related papers (2023-07-18T17:39:04Z)
Learn to Explain: Multimodal Reasoning via Thought Chains for Science Question Answering [124.16250115608604]
We present Science Question Answering (SQA), a new benchmark that consists of 21k multimodal multiple choice questions with a diverse set of science topics and annotations of their answers with corresponding lectures and explanations. We show that SQA improves the question answering performance by 1.20% in few-shot GPT-3 and 3.99% in fine-tuned UnifiedQA. Our analysis further shows that language models, similar to humans, benefit from explanations to learn from fewer data and achieve the same performance with just 40% of the data.
arXiv Detail & Related papers (2022-09-20T07:04:24Z)
SwapMix: Diagnosing and Regularizing the Over-Reliance on Visual Context in Visual Question Answering [20.35687327831644]
We study the robustness of Visual Question Answering (VQA) models from a novel perspective: visual context. SwapMix perturbs the visual context by swapping features of irrelevant context objects with features from other objects in the dataset. We train the models with perfect sight and find that the context over-reliance highly depends on the quality of visual representations.
arXiv Detail & Related papers (2022-04-05T15:32:25Z)
Counterfactual Variable Control for Robust and Interpretable Question Answering [57.25261576239862]
Deep neural network based question answering (QA) models are neither robust nor explainable in many cases. In this paper, we inspect such spurious "capability" of QA models using causal inference. We propose a novel approach called Counterfactual Variable Control (CVC) that explicitly mitigates any shortcut correlation.
arXiv Detail & Related papers (2020-10-12T10:09:05Z)
SRQA: Synthetic Reader for Factoid Question Answering [21.28441702154528]
We introduce a new model called SRQA, which means Synthetic Reader for Factoid Question Answering. This model enhances the question answering system in the multi-document scenario from three aspects. We perform SRQA on the WebQA dataset, and experiments show that our model outperforms the state-of-the-art models.
arXiv Detail & Related papers (2020-09-02T13:16:24Z)
Robust Question Answering Through Sub-part Alignment [53.94003466761305]
We model question answering as an alignment problem. We train our model on SQuAD v1.1 and test it on several adversarial and out-of-domain datasets.
arXiv Detail & Related papers (2020-04-30T09:10:57Z)
ManyModalQA: Modality Disambiguation and QA over Diverse Inputs [73.93607719921945]
We present a new multimodal question answering challenge, ManyModalQA, in which an agent must answer a question by considering three distinct modalities. We collect our data by scraping Wikipedia and then utilize crowdsourcing to collect question-answer pairs.
arXiv Detail & Related papers (2020-01-22T14:39:28Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.