Tracing and Manipulating Intermediate Values in Neural Math Problem
Solvers
- URL: http://arxiv.org/abs/2301.06758v1
- Date: Tue, 17 Jan 2023 08:46:50 GMT
- Title: Tracing and Manipulating Intermediate Values in Neural Math Problem
Solvers
- Authors: Yuta Matsumoto, Benjamin Heinzerling, Masashi Yoshikawa, Kentaro Inui
- Abstract summary: How language models process complex input that requires multiple steps of inference is not well understood.
Previous research has shown that information about intermediate values of these inputs can be extracted from the activations of the models.
We introduce a method for analyzing how a Transformer model processes these inputs by focusing on simple arithmetic problems and their intermediate values.
- Score: 29.957075459315384
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: How language models process complex input that requires multiple steps of
inference is not well understood. Previous research has shown that information
about intermediate values of these inputs can be extracted from the activations
of the models, but it is unclear where that information is encoded and whether
that information is indeed used during inference. We introduce a method for
analyzing how a Transformer model processes these inputs by focusing on simple
arithmetic problems and their intermediate values. To trace where information
about intermediate values is encoded, we measure the correlation between
intermediate values and the activations of the model using principal component
analysis (PCA). Then, we perform a causal intervention by manipulating model
weights. This intervention shows that the weights identified via tracing are
not merely correlated with intermediate values, but causally related to model
predictions. Our findings show that the model has a locality to certain
intermediate values, and this is useful for enhancing the interpretability of
the models.
Related papers
- A Plug-and-Play Method for Rare Human-Object Interactions Detection by Bridging Domain Gap [50.079224604394]
We present a novel model-agnostic framework called textbfContext-textbfEnhanced textbfFeature textbfAment (CEFA)
CEFA consists of a feature alignment module and a context enhancement module.
Our method can serve as a plug-and-play module to improve the detection performance of HOI models on rare categories.
arXiv Detail & Related papers (2024-07-31T08:42:48Z) - A Mechanistic Interpretation of Arithmetic Reasoning in Language Models
using Causal Mediation Analysis [128.0532113800092]
We present a mechanistic interpretation of Transformer-based LMs on arithmetic questions.
This provides insights into how information related to arithmetic is processed by LMs.
arXiv Detail & Related papers (2023-05-24T11:43:47Z) - Causal Analysis for Robust Interpretability of Neural Networks [0.2519906683279152]
We develop a robust interventional-based method to capture cause-effect mechanisms in pre-trained neural networks.
We apply our method to vision models trained on classification tasks.
arXiv Detail & Related papers (2023-05-15T18:37:24Z) - Correlation Information Bottleneck: Towards Adapting Pretrained
Multimodal Models for Robust Visual Question Answering [63.87200781247364]
Correlation Information Bottleneck (CIB) seeks a tradeoff between compression and redundancy in representations.
We derive a tight theoretical upper bound for the mutual information between multimodal inputs and representations.
arXiv Detail & Related papers (2022-09-14T22:04:10Z) - Temporal Relevance Analysis for Video Action Models [70.39411261685963]
We first propose a new approach to quantify the temporal relationships between frames captured by CNN-based action models.
We then conduct comprehensive experiments and in-depth analysis to provide a better understanding of how temporal modeling is affected.
arXiv Detail & Related papers (2022-04-25T19:06:48Z) - Influence Tuning: Demoting Spurious Correlations via Instance
Attribution and Instance-Driven Updates [26.527311287924995]
influence tuning can help deconfounding the model from spurious patterns in data.
We show that in a controlled setup, influence tuning can help deconfounding the model from spurious patterns in data.
arXiv Detail & Related papers (2021-10-07T06:59:46Z) - Shared Interest: Large-Scale Visual Analysis of Model Behavior by
Measuring Human-AI Alignment [15.993648423884466]
Saliency is a technique to identify the importance of input features on a model's output.
We present Shared Interest: a set of metrics for comparing saliency with human annotated ground truths.
We show how Shared Interest can be used to rapidly develop or lose trust in a model's reliability.
arXiv Detail & Related papers (2021-07-20T02:44:39Z) - Triplot: model agnostic measures and visualisations for variable
importance in predictive models that take into account the hierarchical
correlation structure [3.0036519884678894]
We propose new methods to support model analysis by exploiting the information about the correlation between variables.
We show how to analyze groups of variables (aspects) both when they are proposed by the user and when they should be determined automatically.
We also present the new type of model visualisation, triplot, which exploits a hierarchical structure of variable grouping to produce a high information density model visualisation.
arXiv Detail & Related papers (2021-04-07T21:29:03Z) - Paired Examples as Indirect Supervision in Latent Decision Models [109.76417071249945]
We introduce a way to leverage paired examples that provide stronger cues for learning latent decisions.
We apply our method to improve compositional question answering using neural module networks on the DROP dataset.
arXiv Detail & Related papers (2021-04-05T03:58:30Z) - Interpretable Multi-dataset Evaluation for Named Entity Recognition [110.64368106131062]
We present a general methodology for interpretable evaluation for the named entity recognition (NER) task.
The proposed evaluation method enables us to interpret the differences in models and datasets, as well as the interplay between them.
By making our analysis tool available, we make it easy for future researchers to run similar analyses and drive progress in this area.
arXiv Detail & Related papers (2020-11-13T10:53:27Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.