Exploring the Trade-off between Plausibility, Change Intensity and
Adversarial Power in Counterfactual Explanations using Multi-objective
Optimization
- URL: http://arxiv.org/abs/2205.10232v1
- Date: Fri, 20 May 2022 15:02:53 GMT
- Title: Exploring the Trade-off between Plausibility, Change Intensity and
Adversarial Power in Counterfactual Explanations using Multi-objective
Optimization
- Authors: Javier Del Ser, Alejandro Barredo-Arrieta, Natalia D\'iaz-Rodr\'iguez,
Francisco Herrera, Andreas Holzinger
- Abstract summary: We argue that automated counterfactual generation should regard several aspects of the produced adversarial instances.
We present a novel framework for the generation of counterfactual examples.
- Score: 73.89239820192894
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: There is a broad consensus on the importance of deep learning models in tasks
involving complex data. Often, an adequate understanding of these models is
required when focusing on the transparency of decisions in human-critical
applications. Besides other explainability techniques, trustworthiness can be
achieved by using counterfactuals, like the way a human becomes familiar with
an unknown process: by understanding the hypothetical circumstances under which
the output changes. In this work we argue that automated counterfactual
generation should regard several aspects of the produced adversarial instances,
not only their adversarial capability. To this end, we present a novel
framework for the generation of counterfactual examples which formulates its
goal as a multi-objective optimization problem balancing three different
objectives: 1) plausibility, i.e., the likeliness of the counterfactual of
being possible as per the distribution of the input data; 2) intensity of the
changes to the original input; and 3) adversarial power, namely, the
variability of the model's output induced by the counterfactual. The framework
departs from a target model to be audited and uses a Generative Adversarial
Network to model the distribution of input data, together with a
multi-objective solver for the discovery of counterfactuals balancing among
these objectives. The utility of the framework is showcased over six
classification tasks comprising image and three-dimensional data. The
experiments verify that the framework unveils counterfactuals that comply with
intuition, increasing the trustworthiness of the user, and leading to further
insights, such as the detection of bias and data misrepresentation.
Related papers
- Can foundation models actively gather information in interactive environments to test hypotheses? [56.651636971591536]
We introduce a framework in which a model must determine the factors influencing a hidden reward function.
We investigate whether approaches such as self- throughput and increased inference time improve information gathering efficiency.
arXiv Detail & Related papers (2024-12-09T12:27:21Z) - On the Fairness, Diversity and Reliability of Text-to-Image Generative Models [49.60774626839712]
multimodal generative models have sparked critical discussions on their fairness, reliability, and potential for misuse.
We propose an evaluation framework designed to assess model reliability through their responses to perturbations in the embedding space.
Our method lays the groundwork for detecting unreliable, bias-injected models and retrieval of bias provenance.
arXiv Detail & Related papers (2024-11-21T09:46:55Z) - Self-Distilled Disentangled Learning for Counterfactual Prediction [49.84163147971955]
We propose the Self-Distilled Disentanglement framework, known as $SD2$.
Grounded in information theory, it ensures theoretically sound independent disentangled representations without intricate mutual information estimator designs.
Our experiments, conducted on both synthetic and real-world datasets, confirm the effectiveness of our approach.
arXiv Detail & Related papers (2024-06-09T16:58:19Z) - Advancing Counterfactual Inference through Nonlinear Quantile Regression [77.28323341329461]
We propose a framework for efficient and effective counterfactual inference implemented with neural networks.
The proposed approach enhances the capacity to generalize estimated counterfactual outcomes to unseen data.
Empirical results conducted on multiple datasets offer compelling support for our theoretical assertions.
arXiv Detail & Related papers (2023-06-09T08:30:51Z) - Representations of epistemic uncertainty and awareness in data-driven
strategies [0.0]
We present a theoretical model for uncertainty in knowledge representation and its transfer mediated by agents.
We look at inequivalent knowledge representations in terms of inferences, preference relations, and information measures.
We discuss some implications of the proposed model for data-driven strategies.
arXiv Detail & Related papers (2021-10-21T21:18:21Z) - FAIR: Fair Adversarial Instance Re-weighting [0.7829352305480285]
We propose a Fair Adrial Instance Re-weighting (FAIR) method, which uses adversarial training to learn instance weighting function that ensures fair predictions.
To the best of our knowledge, this is the first model that merges reweighting and adversarial approaches by means of a weighting function that can provide interpretable information about fairness of individual instances.
arXiv Detail & Related papers (2020-11-15T10:48:56Z) - Plausible Counterfactuals: Auditing Deep Learning Classifiers with
Realistic Adversarial Examples [84.8370546614042]
Black-box nature of Deep Learning models has posed unanswered questions about what they learn from data.
Generative Adversarial Network (GAN) and multi-objectives are used to furnish a plausible attack to the audited model.
Its utility is showcased within a human face classification task, unveiling the enormous potential of the proposed framework.
arXiv Detail & Related papers (2020-03-25T11:08:56Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.