Cycle-Consistent Counterfactuals by Latent Transformations
- URL: http://arxiv.org/abs/2203.15064v1
- Date: Mon, 28 Mar 2022 20:10:09 GMT
- Title: Cycle-Consistent Counterfactuals by Latent Transformations
- Authors: Saeed Khorram, Li Fuxin
- Abstract summary: Cycle-Consistent Counterfactuals by Latent Transformations (C3LT) learns a latent transformation that automatically generates visuals by steering in the latent space of generative models.
C3LT can be easily plugged into any state-of-the-art pretrained generative network.
In addition to several established metrics for evaluating CF explanations, we introduce a novel metric tailored to assess the quality of the generated CF examples.
- Score: 5.254093731341154
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: CounterFactual (CF) visual explanations try to find images similar to the
query image that change the decision of a vision system to a specified outcome.
Existing methods either require inference-time optimization or joint training
with a generative adversarial model which makes them time-consuming and
difficult to use in practice. We propose a novel approach, Cycle-Consistent
Counterfactuals by Latent Transformations (C3LT), which learns a latent
transformation that automatically generates visual CFs by steering in the
latent space of generative models. Our method uses cycle consistency between
the query and CF latent representations which helps our training to find better
solutions. C3LT can be easily plugged into any state-of-the-art pretrained
generative network. This enables our method to generate high-quality and
interpretable CF images at high resolution such as those in ImageNet. In
addition to several established metrics for evaluating CF explanations, we
introduce a novel metric tailored to assess the quality of the generated CF
examples and validate the effectiveness of our method on an extensive set of
experiments.
Related papers
- SSAM: Self-Supervised Association Modeling for Test-Time Adaption [42.00379819876794]
We propose SSAM (Self-Supervised Association Modeling), a new TTA framework that enables dynamic encoder refinement through dual-phase association learning.<n>Our method operates via two synergistic components: 1) Soft Prototype Estimation (SPE), which estimates probabilistic category associations to guide feature space reorganization, and 2) Prototype-anchored Image Reconstruction (PIR), enforcing encoder stability through cluster-conditional image feature reconstruction.
arXiv Detail & Related papers (2025-05-31T11:13:07Z) - EIDT-V: Exploiting Intersections in Diffusion Trajectories for Model-Agnostic, Zero-Shot, Training-Free Text-to-Video Generation [26.888320234592978]
Zero-shot, training-free, image-based text-to-video generation is an emerging area that aims to generate videos using existing image-based diffusion models.
We provide a model-agnostic approach, using intersections in diffusion trajectories, working only with latent values.
An in-context trained LLM is used to generate coherent frame-wise prompts; another is used to identify differences between frames.
Our approach results in state-of-the-art performance while being more flexible when working with diverse image-generation models.
arXiv Detail & Related papers (2025-04-09T13:11:09Z) - Is Contrastive Distillation Enough for Learning Comprehensive 3D Representations? [55.99654128127689]
Cross-modal contrastive distillation has recently been explored for learning effective 3D representations.
Existing methods focus primarily on modality-shared features, neglecting the modality-specific features during the pre-training process.
We propose a new framework, namely CMCR, to address these shortcomings.
arXiv Detail & Related papers (2024-12-12T06:09:49Z) - Oscillation Inversion: Understand the structure of Large Flow Model through the Lens of Inversion Method [60.88467353578118]
We show that a fixed-point-inspired iterative approach to invert real-world images does not achieve convergence, instead oscillating between distinct clusters.
We introduce a simple and fast distribution transfer technique that facilitates image enhancement, stroke-based recoloring, as well as visual prompt-guided image editing.
arXiv Detail & Related papers (2024-11-17T17:45:37Z) - The Gaussian Discriminant Variational Autoencoder (GdVAE): A Self-Explainable Model with Counterfactual Explanations [6.741417019751439]
Visual counterfactual explanation (CF) methods modify image concepts to change a prediction to a predefined outcome.
We introduce the GdVAE, a self-explainable model based on a conditional variational autoencoder (CVAE) and integrated CF explanations.
The consistency of CFs is improved by regularizing the latent space with the explainer function.
arXiv Detail & Related papers (2024-09-19T17:58:15Z) - Corner-to-Center Long-range Context Model for Efficient Learned Image
Compression [70.0411436929495]
In the framework of learned image compression, the context model plays a pivotal role in capturing the dependencies among latent representations.
We propose the textbfCorner-to-Center transformer-based Context Model (C$3$M) designed to enhance context and latent predictions.
In addition, to enlarge the receptive field in the analysis and synthesis transformation, we use the Long-range Crossing Attention Module (LCAM) in the encoder/decoder.
arXiv Detail & Related papers (2023-11-29T21:40:28Z) - Distance Weighted Trans Network for Image Completion [52.318730994423106]
We propose a new architecture that relies on Distance-based Weighted Transformer (DWT) to better understand the relationships between an image's components.
CNNs are used to augment the local texture information of coarse priors.
DWT blocks are used to recover certain coarse textures and coherent visual structures.
arXiv Detail & Related papers (2023-10-11T12:46:11Z) - Diffusion-based Visual Counterfactual Explanations -- Towards Systematic
Quantitative Evaluation [64.0476282000118]
Latest methods for visual counterfactual explanations (VCE) harness the power of deep generative models to synthesize new examples of high-dimensional images of impressive quality.
It is currently difficult to compare the performance of these VCE methods as the evaluation procedures largely vary and often boil down to visual inspection of individual examples and small scale user studies.
We propose a framework for systematic, quantitative evaluation of the VCE methods and a minimal set of metrics to be used.
arXiv Detail & Related papers (2023-08-11T12:22:37Z) - DELAD: Deep Landweber-guided deconvolution with Hessian and sparse prior [0.22940141855172028]
We present a model for non-blind image deconvolution that incorporates the classic iterative method into a deep learning application.
We build our network based on the iterative Landweber deconvolution algorithm, which is integrated with trainable convolutional layers to enhance the recovered image structures and details.
arXiv Detail & Related papers (2022-09-30T11:15:03Z) - Robust Cross-Modal Representation Learning with Progressive
Self-Distillation [7.676408770854477]
The learning objective of vision-language approach of CLIP does not effectively account for the noisy many-to-many correspondences found in web-harvested image captioning datasets.
We introduce a novel training framework based on cross-modal contrastive learning that uses progressive self-distillation and soft image-text alignments to more efficiently learn robust representations from noisy data.
arXiv Detail & Related papers (2022-04-10T03:28:18Z) - Deblurring via Stochastic Refinement [85.42730934561101]
We present an alternative framework for blind deblurring based on conditional diffusion models.
Our method is competitive in terms of distortion metrics such as PSNR.
arXiv Detail & Related papers (2021-12-05T04:36:09Z) - Contour-guided Image Completion with Perceptual Grouping [7.588025965572449]
This paper implements a modernized model of the Completion Fields (SCF) algorithm.
We show how the SCF algorithm mimics results in human perception.
We use the SCF completed contours as guides for inpainting, and show that our guides improve the performance of state-of-the-art models.
arXiv Detail & Related papers (2021-11-22T16:26:25Z) - Generating Images with Sparse Representations [21.27273495926409]
High dimensionality of images presents architecture and sampling-efficiency challenges for likelihood-based generative models.
We present an alternative approach, inspired by common image compression methods like JPEG, and convert images to quantized discrete cosine transform (DCT) blocks.
We propose a Transformer-based autoregressive architecture, which is trained to sequentially predict the conditional distribution of the next element in such sequences.
arXiv Detail & Related papers (2021-03-05T17:56:03Z) - Learning Deformable Image Registration from Optimization: Perspective,
Modules, Bilevel Training and Beyond [62.730497582218284]
We develop a new deep learning based framework to optimize a diffeomorphic model via multi-scale propagation.
We conduct two groups of image registration experiments on 3D volume datasets including image-to-atlas registration on brain MRI data and image-to-image registration on liver CT data.
arXiv Detail & Related papers (2020-04-30T03:23:45Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.