Improving Geo-diversity of Generated Images with Contextualized Vendi Score Guidance
- URL: http://arxiv.org/abs/2406.04551v2
- Date: Fri, 2 Aug 2024 16:09:49 GMT
- Title: Improving Geo-diversity of Generated Images with Contextualized Vendi Score Guidance
- Authors: Reyhane Askari Hemmat, Melissa Hall, Alicia Sun, Candace Ross, Michal Drozdzal, Adriana Romero-Soriano,
- Abstract summary: State-of-the-art text-to-image generative models struggle to depict everyday objects with the true diversity of the real world.
We introduce an inference time intervention, contextualized Vendi Score Guidance (c-VSG), that guides the backwards steps of latent diffusion models to increase the diversity of a sample.
We find that c-VSG substantially increases the diversity of generated images, both for the worst performing regions and on average, while simultaneously maintaining or improving image quality and consistency.
- Score: 12.33170407159189
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: With the growing popularity of text-to-image generative models, there has been increasing focus on understanding their risks and biases. Recent work has found that state-of-the-art models struggle to depict everyday objects with the true diversity of the real world and have notable gaps between geographic regions. In this work, we aim to increase the diversity of generated images of common objects such that per-region variations are representative of the real world. We introduce an inference time intervention, contextualized Vendi Score Guidance (c-VSG), that guides the backwards steps of latent diffusion models to increase the diversity of a sample as compared to a "memory bank" of previously generated images while constraining the amount of variation within that of an exemplar set of real-world contextualizing images. We evaluate c-VSG with two geographically representative datasets and find that it substantially increases the diversity of generated images, both for the worst performing regions and on average, while simultaneously maintaining or improving image quality and consistency. Additionally, qualitative analyses reveal that diversity of generated images is significantly improved, including along the lines of reductive region portrayals present in the original model. We hope that this work is a step towards text-to-image generative models that reflect the true geographic diversity of the world.
Related papers
- Saliency-Based diversity and fairness Metric and FaceKeepOriginalAugment: A Novel Approach for Enhancing Fairness and Diversity [46.74201905814679]
We introduce an extension of the KeepOriginalAugment method, termed FaceKeepOriginalAugment, which explores various debiasing aspects-geographical, gender, and stereotypical biases-in computer vision models.
By maintaining a delicate balance between data diversity and information preservation, our approach empowers models to exploit both diverse salient and non-salient regions.
We quantify dataset diversity across a range of datasets, including Flickr Faces HQ (FFHQ), WIKI, IMDB, Labelled Faces in the Wild (LFW), UTK Faces, and Diverse dataset.
arXiv Detail & Related papers (2024-10-29T13:49:23Z) - Decomposed evaluations of geographic disparities in text-to-image models [22.491466809896867]
We introduce a new set of metrics, Decomposed Indicators of Disparities in Image Generation (Decomposed-DIG), that allows us to measure geographic disparities in the depiction of objects and backgrounds in generated images.
Using Decomposed-DIG, we audit a widely used latent diffusion model and find that generated images depict objects with better realism than backgrounds.
We use Decomposed-DIG to pinpoint specific examples of disparities, such as stereotypical background generation in Africa, struggling to generate modern vehicles in Africa, and unrealistically placing some objects in outdoor settings.
arXiv Detail & Related papers (2024-06-17T18:04:23Z) - Consistency-diversity-realism Pareto fronts of conditional image generative models [22.372033071088424]
We use state-of-the-art text-to-image and image-and-text-to-image models and their knobs to draw consistency-diversity-realism Pareto fronts.
Our experiments suggest that realism and consistency can both be improved simultaneously.
Our analysis shows that there is no best model and the choice of model should be determined by the downstream application.
arXiv Detail & Related papers (2024-06-14T22:14:11Z) - Diverse Diffusion: Enhancing Image Diversity in Text-to-Image Generation [0.0]
We introduce Diverse Diffusion, a method for boosting image diversity beyond gender and ethnicity.
Our approach contributes to the creation of more inclusive and representative AI-generated art.
arXiv Detail & Related papers (2023-10-19T08:48:23Z) - Improving Diversity in Zero-Shot GAN Adaptation with Semantic Variations [61.132408427908175]
zero-shot GAN adaptation aims to reuse well-trained generators to synthesize images of an unseen target domain.
With only a single representative text feature instead of real images, the synthesized images gradually lose diversity.
We propose a novel method to find semantic variations of the target text in the CLIP space.
arXiv Detail & Related papers (2023-08-21T08:12:28Z) - DIG In: Evaluating Disparities in Image Generations with Indicators for Geographic Diversity [24.887571095245313]
We introduce three indicators to evaluate the realism, diversity and prompt-generation consistency of text-to-image generative systems.
We find that models have less realism and diversity of generations when prompting for Africa and West Asia than Europe.
Perhaps most interestingly, our indicators suggest that progress in image generation quality has come at the cost of real-world geographic representation.
arXiv Detail & Related papers (2023-08-11T15:43:37Z) - Real-World Image Variation by Aligning Diffusion Inversion Chain [53.772004619296794]
A domain gap exists between generated images and real-world images, which poses a challenge in generating high-quality variations of real-world images.
We propose a novel inference pipeline called Real-world Image Variation by ALignment (RIVAL)
Our pipeline enhances the generation quality of image variations by aligning the image generation process to the source image's inversion chain.
arXiv Detail & Related papers (2023-05-30T04:09:47Z) - Effective Data Augmentation With Diffusion Models [65.09758931804478]
We address the lack of diversity in data augmentation with image-to-image transformations parameterized by pre-trained text-to-image diffusion models.
Our method edits images to change their semantics using an off-the-shelf diffusion model, and generalizes to novel visual concepts from a few labelled examples.
We evaluate our approach on few-shot image classification tasks, and on a real-world weed recognition task, and observe an improvement in accuracy in tested domains.
arXiv Detail & Related papers (2023-02-07T20:42:28Z) - Style-Hallucinated Dual Consistency Learning: A Unified Framework for
Visual Domain Generalization [113.03189252044773]
We propose a unified framework, Style-HAllucinated Dual consistEncy learning (SHADE), to handle domain shift in various visual tasks.
Our versatile SHADE can significantly enhance the generalization in various visual recognition tasks, including image classification, semantic segmentation and object detection.
arXiv Detail & Related papers (2022-12-18T11:42:51Z) - Few-shot Image Generation via Cross-domain Correspondence [98.2263458153041]
Training generative models, such as GANs, on a target domain containing limited examples can easily result in overfitting.
In this work, we seek to utilize a large source domain for pretraining and transfer the diversity information from source to target.
To further reduce overfitting, we present an anchor-based strategy to encourage different levels of realism over different regions in the latent space.
arXiv Detail & Related papers (2021-04-13T17:59:35Z) - Random Network Distillation as a Diversity Metric for Both Image and
Text Generation [62.13444904851029]
We develop a new diversity metric that can be applied to data, both synthetic and natural, of any type.
We validate and deploy this metric on both images and text.
arXiv Detail & Related papers (2020-10-13T22:03:52Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.