Related papers: Exploring the Trade-off between Plausibility, Change Intensity and Adversarial Power in Counterfactual Explanations using Multi-objective Optimization

Exploring the Trade-off between Plausibility, Change Intensity and Adversarial Power in Counterfactual Explanations using Multi-objective Optimization

URL: http://arxiv.org/abs/2205.10232v1
Date: Fri, 20 May 2022 15:02:53 GMT
Title: Exploring the Trade-off between Plausibility, Change Intensity and Adversarial Power in Counterfactual Explanations using Multi-objective Optimization
Authors: Javier Del Ser, Alejandro Barredo-Arrieta, Natalia D\'iaz-Rodr\'iguez, Francisco Herrera, Andreas Holzinger
Abstract summary: We argue that automated counterfactual generation should regard several aspects of the produced adversarial instances. We present a novel framework for the generation of counterfactual examples.
Score: 73.89239820192894
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: There is a broad consensus on the importance of deep learning models in tasks involving complex data. Often, an adequate understanding of these models is required when focusing on the transparency of decisions in human-critical applications. Besides other explainability techniques, trustworthiness can be achieved by using counterfactuals, like the way a human becomes familiar with an unknown process: by understanding the hypothetical circumstances under which the output changes. In this work we argue that automated counterfactual generation should regard several aspects of the produced adversarial instances, not only their adversarial capability. To this end, we present a novel framework for the generation of counterfactual examples which formulates its goal as a multi-objective optimization problem balancing three different objectives: 1) plausibility, i.e., the likeliness of the counterfactual of being possible as per the distribution of the input data; 2) intensity of the changes to the original input; and 3) adversarial power, namely, the variability of the model's output induced by the counterfactual. The framework departs from a target model to be audited and uses a Generative Adversarial Network to model the distribution of input data, together with a multi-objective solver for the discovery of counterfactuals balancing among these objectives. The utility of the framework is showcased over six classification tasks comprising image and three-dimensional data. The experiments verify that the framework unveils counterfactuals that comply with intuition, increasing the trustworthiness of the user, and leading to further insights, such as the detection of bias and data misrepresentation.

Related papers

Can foundation models actively gather information in interactive environments to test hypotheses? [56.651636971591536]
We introduce a framework in which a model must determine the factors influencing a hidden reward function. We investigate whether approaches such as self- throughput and increased inference time improve information gathering efficiency.
arXiv Detail & Related papers (2024-12-09T12:27:21Z)
A Review of Bayesian Uncertainty Quantification in Deep Probabilistic Image Segmentation [0.0]
Advancements in image segmentation play an integral role within the greater scope of Deep Learning-based computer vision. Uncertainty quantification has been extensively studied within this context, enabling expression of model ignorance (epistemic uncertainty) or data ambiguity (aleatoric uncertainty) to prevent uninformed decision making. This work provides a comprehensive overview of probabilistic segmentation by discussing fundamental concepts in uncertainty that govern advancements in the field and the application to various tasks.
arXiv Detail & Related papers (2024-11-25T13:26:09Z)
On the Fairness, Diversity and Reliability of Text-to-Image Generative Models [49.60774626839712]
multimodal generative models have sparked critical discussions on their fairness, reliability, and potential for misuse. We propose an evaluation framework designed to assess model reliability through their responses to perturbations in the embedding space. Our method lays the groundwork for detecting unreliable, bias-injected models and retrieval of bias provenance.
arXiv Detail & Related papers (2024-11-21T09:46:55Z)
Self-Distilled Disentangled Learning for Counterfactual Prediction [49.84163147971955]
We propose the Self-Distilled Disentanglement framework, known as $SD2$. Grounded in information theory, it ensures theoretically sound independent disentangled representations without intricate mutual information estimator designs. Our experiments, conducted on both synthetic and real-world datasets, confirm the effectiveness of our approach.
arXiv Detail & Related papers (2024-06-09T16:58:19Z)
Advancing Counterfactual Inference through Nonlinear Quantile Regression [77.28323341329461]
We propose a framework for efficient and effective counterfactual inference implemented with neural networks. The proposed approach enhances the capacity to generalize estimated counterfactual outcomes to unseen data. Empirical results conducted on multiple datasets offer compelling support for our theoretical assertions.
arXiv Detail & Related papers (2023-06-09T08:30:51Z)
Variational Distillation for Multi-View Learning [104.17551354374821]
We design several variational information bottlenecks to exploit two key characteristics for multi-view representation learning. Under rigorously theoretical guarantee, our approach enables IB to grasp the intrinsic correlation between observations and semantic labels.
arXiv Detail & Related papers (2022-06-20T03:09:46Z)
Representations of epistemic uncertainty and awareness in data-driven strategies [0.0]
We present a theoretical model for uncertainty in knowledge representation and its transfer mediated by agents. We look at inequivalent knowledge representations in terms of inferences, preference relations, and information measures. We discuss some implications of the proposed model for data-driven strategies.
arXiv Detail & Related papers (2021-10-21T21:18:21Z)
FAIR: Fair Adversarial Instance Re-weighting [0.7829352305480285]
We propose a Fair Adrial Instance Re-weighting (FAIR) method, which uses adversarial training to learn instance weighting function that ensures fair predictions. To the best of our knowledge, this is the first model that merges reweighting and adversarial approaches by means of a weighting function that can provide interpretable information about fairness of individual instances.
arXiv Detail & Related papers (2020-11-15T10:48:56Z)
Plausible Counterfactuals: Auditing Deep Learning Classifiers with Realistic Adversarial Examples [84.8370546614042]
Black-box nature of Deep Learning models has posed unanswered questions about what they learn from data. Generative Adversarial Network (GAN) and multi-objectives are used to furnish a plausible attack to the audited model. Its utility is showcased within a human face classification task, unveiling the enormous potential of the proposed framework.
arXiv Detail & Related papers (2020-03-25T11:08:56Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.