Related papers: Cycle-Consistent Counterfactuals by Latent Transformations

Cycle-Consistent Counterfactuals by Latent Transformations

URL: http://arxiv.org/abs/2203.15064v1
Date: Mon, 28 Mar 2022 20:10:09 GMT
Title: Cycle-Consistent Counterfactuals by Latent Transformations
Authors: Saeed Khorram, Li Fuxin
Abstract summary: Cycle-Consistent Counterfactuals by Latent Transformations (C3LT) learns a latent transformation that automatically generates visuals by steering in the latent space of generative models. C3LT can be easily plugged into any state-of-the-art pretrained generative network. In addition to several established metrics for evaluating CF explanations, we introduce a novel metric tailored to assess the quality of the generated CF examples.
Score: 5.254093731341154
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: CounterFactual (CF) visual explanations try to find images similar to the query image that change the decision of a vision system to a specified outcome. Existing methods either require inference-time optimization or joint training with a generative adversarial model which makes them time-consuming and difficult to use in practice. We propose a novel approach, Cycle-Consistent Counterfactuals by Latent Transformations (C3LT), which learns a latent transformation that automatically generates visual CFs by steering in the latent space of generative models. Our method uses cycle consistency between the query and CF latent representations which helps our training to find better solutions. C3LT can be easily plugged into any state-of-the-art pretrained generative network. This enables our method to generate high-quality and interpretable CF images at high resolution such as those in ImageNet. In addition to several established metrics for evaluating CF explanations, we introduce a novel metric tailored to assess the quality of the generated CF examples and validate the effectiveness of our method on an extensive set of experiments.

Related papers

SSAM: Self-Supervised Association Modeling for Test-Time Adaption [42.00379819876794]
We propose SSAM (Self-Supervised Association Modeling), a new TTA framework that enables dynamic encoder refinement through dual-phase association learning.<n>Our method operates via two synergistic components: 1) Soft Prototype Estimation (SPE), which estimates probabilistic category associations to guide feature space reorganization, and 2) Prototype-anchored Image Reconstruction (PIR), enforcing encoder stability through cluster-conditional image feature reconstruction.
arXiv Detail & Related papers (2025-05-31T11:13:07Z)
EIDT-V: Exploiting Intersections in Diffusion Trajectories for Model-Agnostic, Zero-Shot, Training-Free Text-to-Video Generation [26.888320234592978]
Zero-shot, training-free, image-based text-to-video generation is an emerging area that aims to generate videos using existing image-based diffusion models. We provide a model-agnostic approach, using intersections in diffusion trajectories, working only with latent values. An in-context trained LLM is used to generate coherent frame-wise prompts; another is used to identify differences between frames. Our approach results in state-of-the-art performance while being more flexible when working with diverse image-generation models.
arXiv Detail & Related papers (2025-04-09T13:11:09Z)
Is Contrastive Distillation Enough for Learning Comprehensive 3D Representations? [55.99654128127689]
Cross-modal contrastive distillation has recently been explored for learning effective 3D representations. Existing methods focus primarily on modality-shared features, neglecting the modality-specific features during the pre-training process. We propose a new framework, namely CMCR, to address these shortcomings.
arXiv Detail & Related papers (2024-12-12T06:09:49Z)
Oscillation Inversion: Understand the structure of Large Flow Model through the Lens of Inversion Method [60.88467353578118]
We show that a fixed-point-inspired iterative approach to invert real-world images does not achieve convergence, instead oscillating between distinct clusters. We introduce a simple and fast distribution transfer technique that facilitates image enhancement, stroke-based recoloring, as well as visual prompt-guided image editing.
arXiv Detail & Related papers (2024-11-17T17:45:37Z)
The Gaussian Discriminant Variational Autoencoder (GdVAE): A Self-Explainable Model with Counterfactual Explanations [6.741417019751439]
Visual counterfactual explanation (CF) methods modify image concepts to change a prediction to a predefined outcome. We introduce the GdVAE, a self-explainable model based on a conditional variational autoencoder (CVAE) and integrated CF explanations. The consistency of CFs is improved by regularizing the latent space with the explainer function.
arXiv Detail & Related papers (2024-09-19T17:58:15Z)
Corner-to-Center Long-range Context Model for Efficient Learned Image Compression [70.0411436929495]
In the framework of learned image compression, the context model plays a pivotal role in capturing the dependencies among latent representations. We propose the textbfCorner-to-Center transformer-based Context Model (C$3$M) designed to enhance context and latent predictions. In addition, to enlarge the receptive field in the analysis and synthesis transformation, we use the Long-range Crossing Attention Module (LCAM) in the encoder/decoder.
arXiv Detail & Related papers (2023-11-29T21:40:28Z)
Distance Weighted Trans Network for Image Completion [52.318730994423106]
We propose a new architecture that relies on Distance-based Weighted Transformer (DWT) to better understand the relationships between an image's components. CNNs are used to augment the local texture information of coarse priors. DWT blocks are used to recover certain coarse textures and coherent visual structures.
arXiv Detail & Related papers (2023-10-11T12:46:11Z)
Diffusion-based Visual Counterfactual Explanations -- Towards Systematic Quantitative Evaluation [64.0476282000118]
Latest methods for visual counterfactual explanations (VCE) harness the power of deep generative models to synthesize new examples of high-dimensional images of impressive quality. It is currently difficult to compare the performance of these VCE methods as the evaluation procedures largely vary and often boil down to visual inspection of individual examples and small scale user studies. We propose a framework for systematic, quantitative evaluation of the VCE methods and a minimal set of metrics to be used.
arXiv Detail & Related papers (2023-08-11T12:22:37Z)
DELAD: Deep Landweber-guided deconvolution with Hessian and sparse prior [0.22940141855172028]
We present a model for non-blind image deconvolution that incorporates the classic iterative method into a deep learning application. We build our network based on the iterative Landweber deconvolution algorithm, which is integrated with trainable convolutional layers to enhance the recovered image structures and details.
arXiv Detail & Related papers (2022-09-30T11:15:03Z)
Robust Cross-Modal Representation Learning with Progressive Self-Distillation [7.676408770854477]
The learning objective of vision-language approach of CLIP does not effectively account for the noisy many-to-many correspondences found in web-harvested image captioning datasets. We introduce a novel training framework based on cross-modal contrastive learning that uses progressive self-distillation and soft image-text alignments to more efficiently learn robust representations from noisy data.
arXiv Detail & Related papers (2022-04-10T03:28:18Z)
Deblurring via Stochastic Refinement [85.42730934561101]
We present an alternative framework for blind deblurring based on conditional diffusion models. Our method is competitive in terms of distortion metrics such as PSNR.
arXiv Detail & Related papers (2021-12-05T04:36:09Z)
Contour-guided Image Completion with Perceptual Grouping [7.588025965572449]
This paper implements a modernized model of the Completion Fields (SCF) algorithm. We show how the SCF algorithm mimics results in human perception. We use the SCF completed contours as guides for inpainting, and show that our guides improve the performance of state-of-the-art models.
arXiv Detail & Related papers (2021-11-22T16:26:25Z)
Generating Images with Sparse Representations [21.27273495926409]
High dimensionality of images presents architecture and sampling-efficiency challenges for likelihood-based generative models. We present an alternative approach, inspired by common image compression methods like JPEG, and convert images to quantized discrete cosine transform (DCT) blocks. We propose a Transformer-based autoregressive architecture, which is trained to sequentially predict the conditional distribution of the next element in such sequences.
arXiv Detail & Related papers (2021-03-05T17:56:03Z)
Learning Deformable Image Registration from Optimization: Perspective, Modules, Bilevel Training and Beyond [62.730497582218284]
We develop a new deep learning based framework to optimize a diffeomorphic model via multi-scale propagation. We conduct two groups of image registration experiments on 3D volume datasets including image-to-atlas registration on brain MRI data and image-to-image registration on liver CT data.
arXiv Detail & Related papers (2020-04-30T03:23:45Z)

This list is automatically generated from the titles and abstracts of the papers in this site.