Related papers: On Mechanistic Circuits for Extractive Question-Answering

On Mechanistic Circuits for Extractive Question-Answering

URL: http://arxiv.org/abs/2502.08059v1
Date: Wed, 12 Feb 2025 01:54:21 GMT
Title: On Mechanistic Circuits for Extractive Question-Answering
Authors: Samyadeep Basu, Vlad Morariu, Zichao Wang, Ryan Rossi, Cherry Zhao, Soheil Feizi, Varun Manjunatha,
Abstract summary: Large language models are increasingly used to process documents and facilitate question-answering on them.<n>In our paper, we extract mechanistic circuits for this real-world language modeling task.<n>We show the potential benefits of circuits towards downstream applications such as data attribution to context information.
Score: 47.167393805165325
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Large language models are increasingly used to process documents and facilitate question-answering on them. In our paper, we extract mechanistic circuits for this real-world language modeling task: context-augmented language modeling for extractive question-answering (QA) tasks and understand the potential benefits of circuits towards downstream applications such as data attribution to context information. We extract circuits as a function of internal model components (e.g., attention heads, MLPs) using causal mediation analysis techniques. Leveraging the extracted circuits, we first understand the interplay between the model's usage of parametric memory and retrieved context towards a better mechanistic understanding of context-augmented language models. We then identify a small set of attention heads in our circuit which performs reliable data attribution by default, thereby obtaining attribution for free in just the model's forward pass. Using this insight, we then introduce ATTNATTRIB, a fast data attribution algorithm which obtains state-of-the-art attribution results across various extractive QA benchmarks. Finally, we show the possibility to steer the language model towards answering from the context, instead of the parametric memory by using the attribution from ATTNATTRIB as an additional signal during the forward pass. Beyond mechanistic understanding, our paper provides tangible applications of circuits in the form of reliable data attribution and model steering.

Related papers

Learning Task Representations from In-Context Learning [73.72066284711462]
Large language models (LLMs) have demonstrated remarkable proficiency in in-context learning.<n>We introduce an automated formulation for encoding task information in ICL prompts as a function of attention heads.<n>We show that our method's effectiveness stems from aligning the distribution of the last hidden state with that of an optimally performing in-context-learned model.
arXiv Detail & Related papers (2025-02-08T00:16:44Z)
LatentQA: Teaching LLMs to Decode Activations Into Natural Language [72.87064562349742]
We introduce LatentQA, the task of answering open-ended questions about model activations in natural language.<n>We propose Latent Interpretation Tuning (LIT), which finetunes a decoder LLM on a dataset of activations and associated question-answer pairs.<n>Our decoder also specifies a differentiable loss that we use to control models, such as debiasing models on stereotyped sentences and controlling the sentiment of generations.
arXiv Detail & Related papers (2024-12-11T18:59:33Z)
Deciphering the Interplay of Parametric and Non-parametric Memory in Retrieval-augmented Language Models [5.274653527674298]
Retrieval-Augmented Generation (RAG) models act like retrieving information before generating responses. We use causal mediation analysis and controlled experiments to examine how internal representations influence information processing.
arXiv Detail & Related papers (2024-10-07T16:14:47Z)
From RAGs to rich parameters: Probing how language models utilize external knowledge over parametric information for factual queries [6.382667978271587]
Retrieval Augmented Generation (RAG) enriches the ability of language models to reason using external context to augment responses for a given user prompt. This approach has risen in popularity due to practical applications in various applications of language models in search, question/answering, and chat-bots. In this paper, we mechanistically examine the RAG pipeline to highlight that language models take shortcut and have a strong bias towards utilizing only the context information to answer the question, while relying minimally on their parametric memory.
arXiv Detail & Related papers (2024-06-18T17:46:08Z)
Fine-tuning and aligning question answering models for complex information extraction tasks [0.8392546351624164]
extractive language models like question answering (QA) or passage retrieval models guarantee query results to be found within the boundaries of an according context document. We show that fine-tuning existing German QA models boosts performance for tailored extraction tasks of complex linguistic features. We deduce a combined metric from Levenshtein distance, F1-Score, Exact Match and ROUGE-L to mimic the assessment criteria from human experts.
arXiv Detail & Related papers (2023-09-26T10:02:21Z)
Zero-shot Composed Text-Image Retrieval [72.43790281036584]
We consider the problem of composed image retrieval (CIR) It aims to train a model that can fuse multi-modal information, e.g., text and images, to accurately retrieve images that match the query, extending the user's expression ability.
arXiv Detail & Related papers (2023-06-12T17:56:01Z)
Language Models Implement Simple Word2Vec-style Vector Arithmetic [32.2976613483151]
A primary criticism towards language models (LMs) is their inscrutability. This paper presents evidence that, despite their size and complexity, LMs sometimes exploit a simple vector arithmetic style mechanism to solve some relational tasks.
arXiv Detail & Related papers (2023-05-25T15:04:01Z)
A Mechanistic Interpretation of Arithmetic Reasoning in Language Models using Causal Mediation Analysis [128.0532113800092]
We present a mechanistic interpretation of Transformer-based LMs on arithmetic questions. This provides insights into how information related to arithmetic is processed by LMs.
arXiv Detail & Related papers (2023-05-24T11:43:47Z)
Leveraging Advantages of Interactive and Non-Interactive Models for Vector-Based Cross-Lingual Information Retrieval [12.514666775853598]
We propose a novel framework to leverage the advantages of interactive and non-interactive models. We introduce semi-interactive mechanism, which builds our model upon non-interactive architecture but encodes each document together with its associated multilingual queries. Our methods significantly boost the retrieval accuracy while maintaining the computational efficiency.
arXiv Detail & Related papers (2021-11-03T03:03:19Z)
Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks [133.93803565077337]
retrieval-augmented generation models combine pre-trained parametric and non-parametric memory for language generation. We show that RAG models generate more specific, diverse and factual language than a state-of-the-art parametric-only seq2seq baseline.
arXiv Detail & Related papers (2020-05-22T21:34:34Z)

This list is automatically generated from the titles and abstracts of the papers in this site.