Related papers: Tracing and Manipulating Intermediate Values in Neural Math Problem Solvers

Tracing and Manipulating Intermediate Values in Neural Math Problem Solvers

URL: http://arxiv.org/abs/2301.06758v1
Date: Tue, 17 Jan 2023 08:46:50 GMT
Title: Tracing and Manipulating Intermediate Values in Neural Math Problem Solvers
Authors: Yuta Matsumoto, Benjamin Heinzerling, Masashi Yoshikawa, Kentaro Inui
Abstract summary: How language models process complex input that requires multiple steps of inference is not well understood. Previous research has shown that information about intermediate values of these inputs can be extracted from the activations of the models. We introduce a method for analyzing how a Transformer model processes these inputs by focusing on simple arithmetic problems and their intermediate values.
Score: 29.957075459315384
License: http://creativecommons.org/licenses/by/4.0/
Abstract: How language models process complex input that requires multiple steps of inference is not well understood. Previous research has shown that information about intermediate values of these inputs can be extracted from the activations of the models, but it is unclear where that information is encoded and whether that information is indeed used during inference. We introduce a method for analyzing how a Transformer model processes these inputs by focusing on simple arithmetic problems and their intermediate values. To trace where information about intermediate values is encoded, we measure the correlation between intermediate values and the activations of the model using principal component analysis (PCA). Then, we perform a causal intervention by manipulating model weights. This intervention shows that the weights identified via tracing are not merely correlated with intermediate values, but causally related to model predictions. Our findings show that the model has a locality to certain intermediate values, and this is useful for enhancing the interpretability of the models.

Related papers

Neural Network Reprogrammability: A Unified Theme on Model Reprogramming, Prompt Tuning, and Prompt Instruction [55.914891182214475]
We introduce neural network reprogrammability as a unifying framework for model adaptation.<n>We present a taxonomy that categorizes such information manipulation approaches across four key dimensions.<n>We also analyze remaining technical challenges and ethical considerations.
arXiv Detail & Related papers (2025-06-05T05:42:27Z)
A Plug-and-Play Method for Rare Human-Object Interactions Detection by Bridging Domain Gap [50.079224604394]
We present a novel model-agnostic framework called textbfContext-textbfEnhanced textbfFeature textbfAment (CEFA) CEFA consists of a feature alignment module and a context enhancement module. Our method can serve as a plug-and-play module to improve the detection performance of HOI models on rare categories.
arXiv Detail & Related papers (2024-07-31T08:42:48Z)
A Mechanistic Interpretation of Arithmetic Reasoning in Language Models using Causal Mediation Analysis [128.0532113800092]
We present a mechanistic interpretation of Transformer-based LMs on arithmetic questions. This provides insights into how information related to arithmetic is processed by LMs.
arXiv Detail & Related papers (2023-05-24T11:43:47Z)
Causal Analysis for Robust Interpretability of Neural Networks [0.2519906683279152]
We develop a robust interventional-based method to capture cause-effect mechanisms in pre-trained neural networks. We apply our method to vision models trained on classification tasks.
arXiv Detail & Related papers (2023-05-15T18:37:24Z)
Correlation Information Bottleneck: Towards Adapting Pretrained Multimodal Models for Robust Visual Question Answering [63.87200781247364]
Correlation Information Bottleneck (CIB) seeks a tradeoff between compression and redundancy in representations. We derive a tight theoretical upper bound for the mutual information between multimodal inputs and representations.
arXiv Detail & Related papers (2022-09-14T22:04:10Z)
Temporal Relevance Analysis for Video Action Models [70.39411261685963]
We first propose a new approach to quantify the temporal relationships between frames captured by CNN-based action models. We then conduct comprehensive experiments and in-depth analysis to provide a better understanding of how temporal modeling is affected.
arXiv Detail & Related papers (2022-04-25T19:06:48Z)
Influence Tuning: Demoting Spurious Correlations via Instance Attribution and Instance-Driven Updates [26.527311287924995]
influence tuning can help deconfounding the model from spurious patterns in data. We show that in a controlled setup, influence tuning can help deconfounding the model from spurious patterns in data.
arXiv Detail & Related papers (2021-10-07T06:59:46Z)
Shared Interest: Large-Scale Visual Analysis of Model Behavior by Measuring Human-AI Alignment [15.993648423884466]
Saliency is a technique to identify the importance of input features on a model's output. We present Shared Interest: a set of metrics for comparing saliency with human annotated ground truths. We show how Shared Interest can be used to rapidly develop or lose trust in a model's reliability.
arXiv Detail & Related papers (2021-07-20T02:44:39Z)
Triplot: model agnostic measures and visualisations for variable importance in predictive models that take into account the hierarchical correlation structure [3.0036519884678894]
We propose new methods to support model analysis by exploiting the information about the correlation between variables. We show how to analyze groups of variables (aspects) both when they are proposed by the user and when they should be determined automatically. We also present the new type of model visualisation, triplot, which exploits a hierarchical structure of variable grouping to produce a high information density model visualisation.
arXiv Detail & Related papers (2021-04-07T21:29:03Z)
Paired Examples as Indirect Supervision in Latent Decision Models [109.76417071249945]
We introduce a way to leverage paired examples that provide stronger cues for learning latent decisions. We apply our method to improve compositional question answering using neural module networks on the DROP dataset.
arXiv Detail & Related papers (2021-04-05T03:58:30Z)
Interpretable Multi-dataset Evaluation for Named Entity Recognition [110.64368106131062]
We present a general methodology for interpretable evaluation for the named entity recognition (NER) task. The proposed evaluation method enables us to interpret the differences in models and datasets, as well as the interplay between them. By making our analysis tool available, we make it easy for future researchers to run similar analyses and drive progress in this area.
arXiv Detail & Related papers (2020-11-13T10:53:27Z)

This list is automatically generated from the titles and abstracts of the papers in this site.