CausaLM: Causal Model Explanation Through Counterfactual Language Models
- URL: http://arxiv.org/abs/2005.13407v5
- Date: Sat, 12 Nov 2022 15:23:25 GMT
- Title: CausaLM: Causal Model Explanation Through Counterfactual Language Models
- Authors: Amir Feder, Nadav Oved, Uri Shalit, Roi Reichart
- Abstract summary: CausaLM is a framework for producing causal model explanations using counterfactual language representation models.
We show that language representation models such as BERT can effectively learn a counterfactual representation for a given concept of interest.
A byproduct of our method is a language representation model that is unaffected by the tested concept.
- Score: 33.29636213961804
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Understanding predictions made by deep neural networks is notoriously
difficult, but also crucial to their dissemination. As all machine learning
based methods, they are as good as their training data, and can also capture
unwanted biases. While there are tools that can help understand whether such
biases exist, they do not distinguish between correlation and causation, and
might be ill-suited for text-based models and for reasoning about high level
language concepts. A key problem of estimating the causal effect of a concept
of interest on a given model is that this estimation requires the generation of
counterfactual examples, which is challenging with existing generation
technology. To bridge that gap, we propose CausaLM, a framework for producing
causal model explanations using counterfactual language representation models.
Our approach is based on fine-tuning of deep contextualized embedding models
with auxiliary adversarial tasks derived from the causal graph of the problem.
Concretely, we show that by carefully choosing auxiliary adversarial
pre-training tasks, language representation models such as BERT can effectively
learn a counterfactual representation for a given concept of interest, and be
used to estimate its true causal effect on model performance. A byproduct of
our method is a language representation model that is unaffected by the tested
concept, which can be useful in mitigating unwanted bias ingrained in the data.
Related papers
- Counterfactual Generation from Language Models [64.55296662926919]
We show that counterfactual reasoning is conceptually distinct from interventions.
We propose a framework for generating true string counterfactuals.
Our experiments demonstrate that the approach produces meaningful counterfactuals.
arXiv Detail & Related papers (2024-11-11T17:57:30Z) - Collapsed Language Models Promote Fairness [88.48232731113306]
We find that debiased language models exhibit collapsed alignment between token representations and word embeddings.
We design a principled fine-tuning method that can effectively improve fairness in a wide range of debiasing methods.
arXiv Detail & Related papers (2024-10-06T13:09:48Z) - Specify Robust Causal Representation from Mixed Observations [35.387451486213344]
Learning representations purely from observations concerns the problem of learning a low-dimensional, compact representation which is beneficial to prediction models.
We develop a learning method to learn such representation from observational data by regularizing the learning procedure with mutual information measures.
We theoretically and empirically show that the models trained with the learned causal representations are more robust under adversarial attacks and distribution shifts.
arXiv Detail & Related papers (2023-10-21T02:18:35Z) - Debiasing Vision-Language Models via Biased Prompts [79.04467131711775]
We propose a general approach for debiasing vision-language foundation models by projecting out biased directions in the text embedding.
We show that debiasing only the text embedding with a calibrated projection matrix suffices to yield robust classifiers and fair generative models.
arXiv Detail & Related papers (2023-01-31T20:09:33Z) - Measuring Causal Effects of Data Statistics on Language Model's
`Factual' Predictions [59.284907093349425]
Large amounts of training data are one of the major reasons for the high performance of state-of-the-art NLP models.
We provide a language for describing how training data influences predictions, through a causal framework.
Our framework bypasses the need to retrain expensive models and allows us to estimate causal effects based on observational data alone.
arXiv Detail & Related papers (2022-07-28T17:36:24Z) - Interpretable Data-Based Explanations for Fairness Debugging [7.266116143672294]
Gopher is a system that produces compact, interpretable, and causal explanations for bias or unexpected model behavior.
We introduce the concept of causal responsibility that quantifies the extent to which intervening on training data by removing or updating subsets of it can resolve the bias.
Building on this concept, we develop an efficient approach for generating the top-k patterns that explain model bias.
arXiv Detail & Related papers (2021-12-17T20:10:00Z) - iReason: Multimodal Commonsense Reasoning using Videos and Natural
Language with Interpretability [0.0]
Causality knowledge is vital to building robust AI systems.
We propose iReason, a framework that infers visual-semantic commonsense knowledge using both videos and natural language captions.
arXiv Detail & Related papers (2021-06-25T02:56:34Z) - Beyond Trivial Counterfactual Explanations with Diverse Valuable
Explanations [64.85696493596821]
In computer vision applications, generative counterfactual methods indicate how to perturb a model's input to change its prediction.
We propose a counterfactual method that learns a perturbation in a disentangled latent space that is constrained using a diversity-enforcing loss.
Our model improves the success rate of producing high-quality valuable explanations when compared to previous state-of-the-art methods.
arXiv Detail & Related papers (2021-03-18T12:57:34Z) - Learning from others' mistakes: Avoiding dataset biases without modeling
them [111.17078939377313]
State-of-the-art natural language processing (NLP) models often learn to model dataset biases and surface form correlations instead of features that target the intended task.
Previous work has demonstrated effective methods to circumvent these issues when knowledge of the bias is available.
We show a method for training models that learn to ignore these problematic correlations.
arXiv Detail & Related papers (2020-12-02T16:10:54Z) - Causal Inference with Deep Causal Graphs [0.0]
Parametric causal modelling techniques rarely provide functionality for counterfactual estimation.
Deep Causal Graphs is an abstract specification of the required functionality for a neural network to model causal distributions.
We demonstrate its expressive power in modelling complex interactions and showcase applications to machine learning explainability and fairness.
arXiv Detail & Related papers (2020-06-15T13:03:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.