Related papers: Benchmarking Counterfactual Image Generation

Benchmarking Counterfactual Image Generation

URL: http://arxiv.org/abs/2403.20287v2
Date: Mon, 10 Jun 2024 14:47:46 GMT
Title: Benchmarking Counterfactual Image Generation
Authors: Thomas Melistas, Nikos Spyrou, Nefeli Gkouti, Pedro Sanchez, Athanasios Vlontzos, Yannis Panagakis, Giorgos Papanastasiou, Sotirios A. Tsaftaris,
Abstract summary: Generative AI has revolutionised visual content editing, empowering users to effortlessly modify images and videos. To perform realistic edits in domains such as natural image or medical imaging, modifications must respect causal relationships. We present a comparison framework to thoroughly benchmark counterfactual image generation methods.
Score: 22.573830532174956
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Generative AI has revolutionised visual content editing, empowering users to effortlessly modify images and videos. However, not all edits are equal. To perform realistic edits in domains such as natural image or medical imaging, modifications must respect causal relationships inherent to the data generation process. Such image editing falls into the counterfactual image generation regime. Evaluating counterfactual image generation is substantially complex: not only it lacks observable ground truths, but also requires adherence to causal constraints. Although several counterfactual image generation methods and evaluation metrics exist, a comprehensive comparison within a unified setting is lacking. We present a comparison framework to thoroughly benchmark counterfactual image generation methods. We integrate all models that have been used for the task at hand and expand them to novel datasets and causal graphs, demonstrating the superiority of Hierarchical VAEs across most datasets and metrics. Our framework is implemented in a user-friendly Python package that can be extended to incorporate additional SCMs, causal methods, generative models, and datasets for the community to build on.

Related papers

RealGeneral: Unifying Visual Generation via Temporal In-Context Learning with Video Models [22.042487298092883]
RealGeneral is a novel framework that reformulates image generation as a conditional frame prediction task. It mitigates a 14.5% improvement in subject similarity for customized generation and a 10% enhancement in image quality for canny-to-image task.
arXiv Detail & Related papers (2025-03-13T14:31:52Z)
EditAR: Unified Conditional Generation with Autoregressive Models [58.093860528672735]
We propose EditAR, a single unified autoregressive framework for a variety of conditional image generation tasks. The model takes both images and instructions as inputs, and predicts the edited images tokens in a vanilla next-token paradigm. We evaluate its effectiveness across diverse tasks on established benchmarks, showing competitive performance to various state-of-the-art task-specific methods.
arXiv Detail & Related papers (2025-01-08T18:59:35Z)
ImagenHub: Standardizing the evaluation of conditional image generation models [48.51117156168]
This paper proposes ImagenHub, which is a one-stop library to standardize the inference and evaluation of all conditional image generation models. We design two human evaluation scores, i.e. Semantic Consistency and Perceptual Quality, along with comprehensive guidelines to evaluate generated images. Our human evaluation achieves a high inter-worker agreement of Krippendorff's alpha on 76% models with a value higher than 0.4.
arXiv Detail & Related papers (2023-10-02T19:41:42Z)
Benchmarking Robustness to Text-Guided Corruptions [0.0]
We use diffusion models to edit images to different domains. We define a prompt hierarchy based on the original ImageNet hierarchy to apply edits in different domains. We observe that convolutional models are more robust than transformer architectures.
arXiv Detail & Related papers (2023-04-06T09:40:02Z)
Re-Imagen: Retrieval-Augmented Text-to-Image Generator [58.60472701831404]
Retrieval-Augmented Text-to-Image Generator (Re-Imagen) Retrieval-Augmented Text-to-Image Generator (Re-Imagen)
arXiv Detail & Related papers (2022-09-29T00:57:28Z)
A Shared Representation for Photorealistic Driving Simulators [83.5985178314263]
We propose to improve the quality of generated images by rethinking the discriminator architecture. The focus is on the class of problems where images are generated given semantic inputs, such as scene segmentation maps or human body poses. We aim to learn a shared latent representation that encodes enough information to jointly do semantic segmentation, content reconstruction, along with a coarse-to-fine grained adversarial reasoning.
arXiv Detail & Related papers (2021-12-09T18:59:21Z)
InvGAN: Invertible GANs [88.58338626299837]
InvGAN, short for Invertible GAN, successfully embeds real images to the latent space of a high quality generative model. This allows us to perform image inpainting, merging, and online data augmentation.
arXiv Detail & Related papers (2021-12-08T21:39:00Z)
Image Scene Graph Generation (SGG) Benchmark [58.33119409657256]
There is a surge of interest in image scene graph generation (object, and relationship detection) Due to the lack of a good benchmark, the reported results of different scene graph generation models are not directly comparable. We have developed a much-needed scene graph generation benchmark based on the maskrcnn-benchmark and several popular models.
arXiv Detail & Related papers (2021-07-27T05:10:09Z)
RTIC: Residual Learning for Text and Image Composition using Graph Convolutional Network [19.017377597937617]
We study the compositional learning of images and texts for image retrieval. We introduce a novel method that combines the graph convolutional network (GCN) with existing composition methods.
arXiv Detail & Related papers (2021-04-07T09:41:52Z)
Diverse Single Image Generation with Controllable Global Structure though Self-Attention [1.2522889958051286]
We show how to generate images that require global context using generative adversarial networks. Our results are visually better than the state-of-the-art particularly in generating images that require global context. The diversity of our image generation, measured using the average standard deviation of pixels, is also better.
arXiv Detail & Related papers (2021-02-09T11:52:48Z)
Swapping Autoencoder for Deep Image Manipulation [94.33114146172606]
We propose the Swapping Autoencoder, a deep model designed specifically for image manipulation. The key idea is to encode an image with two independent components and enforce that any swapped combination maps to a realistic image. Experiments on multiple datasets show that our model produces better results and is substantially more efficient compared to recent generative models.
arXiv Detail & Related papers (2020-07-01T17:59:57Z)

This list is automatically generated from the titles and abstracts of the papers in this site.