Exploring the Trade-off between Plausibility, Change Intensity and
Adversarial Power in Counterfactual Explanations using Multi-objective
Optimization
- URL: http://arxiv.org/abs/2205.10232v1
- Date: Fri, 20 May 2022 15:02:53 GMT
- Title: Exploring the Trade-off between Plausibility, Change Intensity and
Adversarial Power in Counterfactual Explanations using Multi-objective
Optimization
- Authors: Javier Del Ser, Alejandro Barredo-Arrieta, Natalia D\'iaz-Rodr\'iguez,
Francisco Herrera, Andreas Holzinger
- Abstract summary: We argue that automated counterfactual generation should regard several aspects of the produced adversarial instances.
We present a novel framework for the generation of counterfactual examples.
- Score: 73.89239820192894
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: There is a broad consensus on the importance of deep learning models in tasks
involving complex data. Often, an adequate understanding of these models is
required when focusing on the transparency of decisions in human-critical
applications. Besides other explainability techniques, trustworthiness can be
achieved by using counterfactuals, like the way a human becomes familiar with
an unknown process: by understanding the hypothetical circumstances under which
the output changes. In this work we argue that automated counterfactual
generation should regard several aspects of the produced adversarial instances,
not only their adversarial capability. To this end, we present a novel
framework for the generation of counterfactual examples which formulates its
goal as a multi-objective optimization problem balancing three different
objectives: 1) plausibility, i.e., the likeliness of the counterfactual of
being possible as per the distribution of the input data; 2) intensity of the
changes to the original input; and 3) adversarial power, namely, the
variability of the model's output induced by the counterfactual. The framework
departs from a target model to be audited and uses a Generative Adversarial
Network to model the distribution of input data, together with a
multi-objective solver for the discovery of counterfactuals balancing among
these objectives. The utility of the framework is showcased over six
classification tasks comprising image and three-dimensional data. The
experiments verify that the framework unveils counterfactuals that comply with
intuition, increasing the trustworthiness of the user, and leading to further
insights, such as the detection of bias and data misrepresentation.
Related papers
- Self-Distilled Disentangled Learning for Counterfactual Prediction [49.84163147971955]
We propose the Self-Distilled Disentanglement framework, known as $SD2$.
Grounded in information theory, it ensures theoretically sound independent disentangled representations without intricate mutual information estimator designs.
Our experiments, conducted on both synthetic and real-world datasets, confirm the effectiveness of our approach.
arXiv Detail & Related papers (2024-06-09T16:58:19Z) - Corpus Considerations for Annotator Modeling and Scaling [9.263562546969695]
We show that the commonly used user token model consistently outperforms more complex models.
Our findings shed light on the relationship between corpus statistics and annotator modeling performance.
arXiv Detail & Related papers (2024-04-02T22:27:24Z) - Advancing Counterfactual Inference through Nonlinear Quantile Regression [77.28323341329461]
We propose a framework for efficient and effective counterfactual inference implemented with neural networks.
The proposed approach enhances the capacity to generalize estimated counterfactual outcomes to unseen data.
Empirical results conducted on multiple datasets offer compelling support for our theoretical assertions.
arXiv Detail & Related papers (2023-06-09T08:30:51Z) - Variational Distillation for Multi-View Learning [104.17551354374821]
We design several variational information bottlenecks to exploit two key characteristics for multi-view representation learning.
Under rigorously theoretical guarantee, our approach enables IB to grasp the intrinsic correlation between observations and semantic labels.
arXiv Detail & Related papers (2022-06-20T03:09:46Z) - Representations of epistemic uncertainty and awareness in data-driven
strategies [0.0]
We present a theoretical model for uncertainty in knowledge representation and its transfer mediated by agents.
We look at inequivalent knowledge representations in terms of inferences, preference relations, and information measures.
We discuss some implications of the proposed model for data-driven strategies.
arXiv Detail & Related papers (2021-10-21T21:18:21Z) - FAIR: Fair Adversarial Instance Re-weighting [0.7829352305480285]
We propose a Fair Adrial Instance Re-weighting (FAIR) method, which uses adversarial training to learn instance weighting function that ensures fair predictions.
To the best of our knowledge, this is the first model that merges reweighting and adversarial approaches by means of a weighting function that can provide interpretable information about fairness of individual instances.
arXiv Detail & Related papers (2020-11-15T10:48:56Z) - Accurate and Robust Feature Importance Estimation under Distribution
Shifts [49.58991359544005]
PRoFILE is a novel feature importance estimation method.
We show significant improvements over state-of-the-art approaches, both in terms of fidelity and robustness.
arXiv Detail & Related papers (2020-09-30T05:29:01Z) - Plausible Counterfactuals: Auditing Deep Learning Classifiers with
Realistic Adversarial Examples [84.8370546614042]
Black-box nature of Deep Learning models has posed unanswered questions about what they learn from data.
Generative Adversarial Network (GAN) and multi-objectives are used to furnish a plausible attack to the audited model.
Its utility is showcased within a human face classification task, unveiling the enormous potential of the proposed framework.
arXiv Detail & Related papers (2020-03-25T11:08:56Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.