Optimal ablation for interpretability
- URL: http://arxiv.org/abs/2409.09951v1
- Date: Mon, 16 Sep 2024 02:45:54 GMT
- Title: Optimal ablation for interpretability
- Authors: Maximilian Li, Lucas Janson,
- Abstract summary: Interpretability studies often involve tracing the flow of information through machine learning models.
Prior work quantifies the importance of a model component on a particular task by measuring the impact of performing ablation on that component, or simulating model inference with the component disabled.
We propose a new method, optimal ablation (OA), and show that OA-based component importance has theoretical and empirical advantages over measuring importance via other ablation methods.
- Score: 5.108909395876561
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Interpretability studies often involve tracing the flow of information through machine learning models to identify specific model components that perform relevant computations for tasks of interest. Prior work quantifies the importance of a model component on a particular task by measuring the impact of performing ablation on that component, or simulating model inference with the component disabled. We propose a new method, optimal ablation (OA), and show that OA-based component importance has theoretical and empirical advantages over measuring importance via other ablation methods. We also show that OA-based component importance can benefit several downstream interpretability tasks, including circuit discovery, localization of factual recall, and latent prediction.
Related papers
- Aggregation Artifacts in Subjective Tasks Collapse Large Language Models' Posteriors [74.04775677110179]
In-context Learning (ICL) has become the primary method for performing natural language tasks with Large Language Models (LLMs)
In this work, we examine whether this is the result of the aggregation used in corresponding datasets, where trying to combine low-agreement, disparate annotations might lead to annotation artifacts that create detrimental noise in the prompt.
Our results indicate that aggregation is a confounding factor in the modeling of subjective tasks, and advocate focusing on modeling individuals instead.
arXiv Detail & Related papers (2024-10-17T17:16:00Z) - Learning to Extract Structured Entities Using Language Models [52.281701191329]
Recent advances in machine learning have significantly impacted the field of information extraction.
We reformulate the task to be entity-centric, enabling the use of diverse metrics.
We contribute to the field by introducing Structured Entity Extraction and proposing the Approximate Entity Set OverlaP metric.
arXiv Detail & Related papers (2024-02-06T22:15:09Z) - Towards Better Modeling with Missing Data: A Contrastive Learning-based
Visual Analytics Perspective [7.577040836988683]
Missing data can pose a challenge for machine learning (ML) modeling.
Current approaches are categorized into feature imputation and label prediction.
This study proposes a Contrastive Learning framework to model observed data with missing values.
arXiv Detail & Related papers (2023-09-18T13:16:24Z) - A Mechanistic Interpretation of Arithmetic Reasoning in Language Models
using Causal Mediation Analysis [128.0532113800092]
We present a mechanistic interpretation of Transformer-based LMs on arithmetic questions.
This provides insights into how information related to arithmetic is processed by LMs.
arXiv Detail & Related papers (2023-05-24T11:43:47Z) - Sparse Relational Reasoning with Object-Centric Representations [78.83747601814669]
We investigate the composability of soft-rules learned by relational neural architectures when operating over object-centric representations.
We find that increasing sparsity, especially on features, improves the performance of some models and leads to simpler relations.
arXiv Detail & Related papers (2022-07-15T14:57:33Z) - Ultra-marginal Feature Importance: Learning from Data with Causal Guarantees [1.2289361708127877]
Marginal contribution feature importance (MCI) was developed to quantifying the relationships in data.
We introduce ultra-marginal feature importance (UMFI), which uses dependence removal techniques from the AI fairness literature as its foundation.
We show on real and simulated data that UMFI performs better than MCI, especially in the presence of correlated interactions and unrelated features.
arXiv Detail & Related papers (2022-04-21T07:54:58Z) - A Probit Tensor Factorization Model For Relational Learning [31.613211987639296]
We propose a binary tensor factorization model with probit link, which inherits the computation efficiency from the classic tensor factorization model.
Our proposed probit tensor factorization (PTF) model shows advantages in both the prediction accuracy and interpretability.
arXiv Detail & Related papers (2021-11-06T19:23:07Z) - SAIS: Supervising and Augmenting Intermediate Steps for Document-Level
Relation Extraction [51.27558374091491]
We propose to explicitly teach the model to capture relevant contexts and entity types by supervising and augmenting intermediate steps (SAIS) for relation extraction.
Based on a broad spectrum of carefully designed tasks, our proposed SAIS method not only extracts relations of better quality due to more effective supervision, but also retrieves the corresponding supporting evidence more accurately.
arXiv Detail & Related papers (2021-09-24T17:37:35Z) - Learning Neural Causal Models with Active Interventions [83.44636110899742]
We introduce an active intervention-targeting mechanism which enables a quick identification of the underlying causal structure of the data-generating process.
Our method significantly reduces the required number of interactions compared with random intervention targeting.
We demonstrate superior performance on multiple benchmarks from simulated to real-world data.
arXiv Detail & Related papers (2021-09-06T13:10:37Z) - Understanding Global Feature Contributions With Additive Importance
Measures [14.50261153230204]
We explore the perspective of defining feature importance through the predictive power associated with each feature.
We introduce two notions of predictive power (model-based and universal) and formalize this approach with a framework of additive importance measures.
We then propose SAGE, a model-agnostic method that quantifies predictive power while accounting for feature interactions.
arXiv Detail & Related papers (2020-04-01T19:17:58Z) - Feature Importance Estimation with Self-Attention Networks [0.0]
Black-box neural network models are widely used in industry and science, yet are hard to understand and interpret.
Recently, the attention mechanism was introduced, offering insights into the inner workings of neural language models.
This paper explores the use of attention-based neural networks mechanism for estimating feature importance, as means for explaining the models learned from propositional (tabular) data.
arXiv Detail & Related papers (2020-02-11T15:15:58Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.