Generative Counterfactuals for Neural Networks via Attribute-Informed
Perturbation
- URL: http://arxiv.org/abs/2101.06930v1
- Date: Mon, 18 Jan 2021 08:37:13 GMT
- Title: Generative Counterfactuals for Neural Networks via Attribute-Informed
Perturbation
- Authors: Fan Yang, Ninghao Liu, Mengnan Du, Xia Hu
- Abstract summary: We design a framework to generate counterfactuals for raw data instances with the proposed Attribute-Informed Perturbation (AIP)
By utilizing generative models conditioned with different attributes, counterfactuals with desired labels can be obtained effectively and efficiently.
Experimental results on real-world texts and images demonstrate the effectiveness, sample quality as well as efficiency of our designed framework.
- Score: 51.29486247405601
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: With the wide use of deep neural networks (DNN), model interpretability has
become a critical concern, since explainable decisions are preferred in
high-stake scenarios. Current interpretation techniques mainly focus on the
feature attribution perspective, which are limited in indicating why and how
particular explanations are related to the prediction. To this end, an
intriguing class of explanations, named counterfactuals, has been developed to
further explore the "what-if" circumstances for interpretation, and enables the
reasoning capability on black-box models. However, generating counterfactuals
for raw data instances (i.e., text and image) is still in the early stage due
to its challenges on high data dimensionality and unsemantic raw features. In
this paper, we design a framework to generate counterfactuals specifically for
raw data instances with the proposed Attribute-Informed Perturbation (AIP). By
utilizing generative models conditioned with different attributes,
counterfactuals with desired labels can be obtained effectively and
efficiently. Instead of directly modifying instances in the data space, we
iteratively optimize the constructed attribute-informed latent space, where
features are more robust and semantic. Experimental results on real-world texts
and images demonstrate the effectiveness, sample quality as well as efficiency
of our designed framework, and show the superiority over other alternatives.
Besides, we also introduce some practical applications based on our framework,
indicating its potential beyond the model interpretability aspect.
Related papers
- Improving Network Interpretability via Explanation Consistency Evaluation [56.14036428778861]
We propose a framework that acquires more explainable activation heatmaps and simultaneously increase the model performance.
Specifically, our framework introduces a new metric, i.e., explanation consistency, to reweight the training samples adaptively in model learning.
Our framework then promotes the model learning by paying closer attention to those training samples with a high difference in explanations.
arXiv Detail & Related papers (2024-08-08T17:20:08Z) - Enhancing AI-based Generation of Software Exploits with Contextual Information [9.327315119028809]
The study employs a dataset comprising real shellcodes to evaluate the models across various scenarios.
The experiments are designed to assess the models' resilience against incomplete descriptions, their proficiency in leveraging context for enhanced accuracy, and their ability to discern irrelevant information.
The models demonstrate an ability to filter out unnecessary context, maintaining high levels of accuracy in the generation of offensive security code.
arXiv Detail & Related papers (2024-08-05T11:52:34Z) - Corpus Considerations for Annotator Modeling and Scaling [9.263562546969695]
We show that the commonly used user token model consistently outperforms more complex models.
Our findings shed light on the relationship between corpus statistics and annotator modeling performance.
arXiv Detail & Related papers (2024-04-02T22:27:24Z) - Prospector Heads: Generalized Feature Attribution for Large Models & Data [82.02696069543454]
We introduce prospector heads, an efficient and interpretable alternative to explanation-based attribution methods.
We demonstrate how prospector heads enable improved interpretation and discovery of class-specific patterns in input data.
arXiv Detail & Related papers (2024-02-18T23:01:28Z) - LaPLACE: Probabilistic Local Model-Agnostic Causal Explanations [1.0370398945228227]
We introduce LaPLACE-explainer, designed to provide probabilistic cause-and-effect explanations for machine learning models.
The LaPLACE-Explainer component leverages the concept of a Markov blanket to establish statistical boundaries between relevant and non-relevant features.
Our approach offers causal explanations and outperforms LIME and SHAP in terms of local accuracy and consistency of explained features.
arXiv Detail & Related papers (2023-10-01T04:09:59Z) - Exploring the Trade-off between Plausibility, Change Intensity and
Adversarial Power in Counterfactual Explanations using Multi-objective
Optimization [73.89239820192894]
We argue that automated counterfactual generation should regard several aspects of the produced adversarial instances.
We present a novel framework for the generation of counterfactual examples.
arXiv Detail & Related papers (2022-05-20T15:02:53Z) - Preference Enhanced Social Influence Modeling for Network-Aware Cascade
Prediction [59.221668173521884]
We propose a novel framework to promote cascade size prediction by enhancing the user preference modeling.
Our end-to-end method makes the user activating process of information diffusion more adaptive and accurate.
arXiv Detail & Related papers (2022-04-18T09:25:06Z) - Information-Theoretic Odometry Learning [83.36195426897768]
We propose a unified information theoretic framework for learning-motivated methods aimed at odometry estimation.
The proposed framework provides an elegant tool for performance evaluation and understanding in information-theoretic language.
arXiv Detail & Related papers (2022-03-11T02:37:35Z) - Combining Discrete Choice Models and Neural Networks through Embeddings:
Formulation, Interpretability and Performance [10.57079240576682]
This study proposes a novel approach that combines theory and data-driven choice models using Artificial Neural Networks (ANNs)
In particular, we use continuous vector representations, called embeddings, for encoding categorical or discrete explanatory variables.
Our models deliver state-of-the-art predictive performance, outperforming existing ANN-based models while drastically reducing the number of required network parameters.
arXiv Detail & Related papers (2021-09-24T15:55:31Z) - A Framework to Learn with Interpretation [2.3741312212138896]
We present a novel framework to jointly learn a predictive model and its associated interpretation model.
We seek for a small-size dictionary of high level attribute functions that take as inputs the outputs of selected hidden layers.
A detailed pipeline to visualize the learnt features is also developed.
arXiv Detail & Related papers (2020-10-19T09:26:28Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.