Global Counterfactual Directions
- URL: http://arxiv.org/abs/2404.12488v2
- Date: Tue, 23 Jul 2024 11:58:53 GMT
- Title: Global Counterfactual Directions
- Authors: Bartlomiej Sobieski, Przemysław Biecek,
- Abstract summary: We show that the latent space of Diffusion Autoencoders encodes the inference process of a given classifier in the form of global directions.
We propose a novel proxy-based approach that discovers two types of these directions with the use of only single image in an entirely black-box manner.
We show that GCDs can be naturally combined with Latent Integrated Gradients resulting in a new black-box attribution method.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Despite increasing progress in development of methods for generating visual counterfactual explanations, especially with the recent rise of Denoising Diffusion Probabilistic Models, previous works consider them as an entirely local technique. In this work, we take the first step at globalizing them. Specifically, we discover that the latent space of Diffusion Autoencoders encodes the inference process of a given classifier in the form of global directions. We propose a novel proxy-based approach that discovers two types of these directions with the use of only single image in an entirely black-box manner. Precisely, g-directions allow for flipping the decision of a given classifier on an entire dataset of images, while h-directions further increase the diversity of explanations. We refer to them in general as Global Counterfactual Directions (GCDs). Moreover, we show that GCDs can be naturally combined with Latent Integrated Gradients resulting in a new black-box attribution method, while simultaneously enhancing the understanding of counterfactual explanations. We validate our approach on existing benchmarks and show that it generalizes to real-world use-cases.
Related papers
- Decoding Diffusion: A Scalable Framework for Unsupervised Analysis of Latent Space Biases and Representations Using Natural Language Prompts [68.48103545146127]
This paper proposes a novel framework for unsupervised exploration of diffusion latent spaces.
We directly leverage natural language prompts and image captions to map latent directions.
Our method provides a more scalable and interpretable understanding of the semantic knowledge encoded within diffusion models.
arXiv Detail & Related papers (2024-10-25T21:44:51Z) - Decompose the model: Mechanistic interpretability in image models with Generalized Integrated Gradients (GIG) [24.02036048242832]
This paper introduces a novel approach to trace the entire pathway from input through all intermediate layers to the final output within the whole dataset.
We utilize Pointwise Feature Vectors (PFVs) and Effective Receptive Fields (ERFs) to decompose model embeddings into interpretable Concept Vectors.
Then, we calculate the relevance between concept vectors with our Generalized Integrated Gradients (GIG) enabling a comprehensive, dataset-wide analysis of model behavior.
arXiv Detail & Related papers (2024-09-03T05:19:35Z) - One Subgraph for All: Efficient Reasoning on Opening Subgraphs for Inductive Knowledge Graph Completion [12.644979036930383]
Knowledge Graph Completion (KGC) has garnered massive research interest recently.
Most existing methods are designed following a transductive setting where all entities are observed during training.
In inductive KGC, which aims to deduce missing links among unseen entities, has become a new trend.
arXiv Detail & Related papers (2024-04-24T11:12:08Z) - Unsupervised Discovery of Interpretable Directions in h-space of
Pre-trained Diffusion Models [63.1637853118899]
We propose the first unsupervised and learning-based method to identify interpretable directions in h-space of pre-trained diffusion models.
We employ a shift control module that works on h-space of pre-trained diffusion models to manipulate a sample into a shifted version of itself.
By jointly optimizing them, the model will spontaneously discover disentangled and interpretable directions.
arXiv Detail & Related papers (2023-10-15T18:44:30Z) - Generalized Schrödinger Bridge Matching [54.171931505066]
Generalized Schr"odinger Bridge (GSB) problem setup is prevalent in many scientific areas both within and without machine learning.
We propose Generalized Schr"odinger Bridge Matching (GSBM), a new matching algorithm inspired by recent advances.
We show that such a generalization can be cast as solving conditional optimal control, for which variational approximations can be used.
arXiv Detail & Related papers (2023-10-03T17:42:11Z) - Prompting Diffusion Representations for Cross-Domain Semantic
Segmentation [101.04326113360342]
diffusion-pretraining achieves extraordinary domain generalization results for semantic segmentation.
We introduce a scene prompt and a prompt randomization strategy to help further disentangle the domain-invariant information when training the segmentation head.
arXiv Detail & Related papers (2023-07-05T09:28:25Z) - Discovering Interpretable Directions in the Semantic Latent Space of Diffusion Models [21.173910627285338]
Denoising Diffusion Models (DDMs) have emerged as a strong competitor to Generative Adversarial Networks (GANs)
In this paper, we explore the properties of h-space and propose several novel methods for finding meaningful semantic directions within it.
Our approaches are applicable without requiring architectural modifications, text-based guidance, CLIP-based optimization, or model fine-tuning.
arXiv Detail & Related papers (2023-03-20T12:59:32Z) - Domain Generalization via Shuffled Style Assembly for Face Anti-Spoofing [69.80851569594924]
Generalizable face anti-spoofing (FAS) has drawn growing attention.
In this work, we separate the complete representation into content and style ones.
A novel Shuffled Style Assembly Network (SSAN) is proposed to extract and reassemble different content and style features.
arXiv Detail & Related papers (2022-03-10T12:44:05Z) - Best of both worlds: local and global explanations with
human-understandable concepts [10.155485106226754]
Interpretability techniques aim to provide the rationale behind a model's decision, typically by explaining either an individual prediction or a class of predictions.
We show that our method improves global explanations over TCAV when compared to ground truth, and provides useful insights.
arXiv Detail & Related papers (2021-06-16T09:05:25Z) - Closed-Form Factorization of Latent Semantics in GANs [65.42778970898534]
A rich set of interpretable dimensions has been shown to emerge in the latent space of the Generative Adversarial Networks (GANs) trained for synthesizing images.
In this work, we examine the internal representation learned by GANs to reveal the underlying variation factors in an unsupervised manner.
We propose a closed-form factorization algorithm for latent semantic discovery by directly decomposing the pre-trained weights.
arXiv Detail & Related papers (2020-07-13T18:05:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.