Decomposed evaluations of geographic disparities in text-to-image models
- URL: http://arxiv.org/abs/2406.11988v1
- Date: Mon, 17 Jun 2024 18:04:23 GMT
- Title: Decomposed evaluations of geographic disparities in text-to-image models
- Authors: Abhishek Sureddy, Dishant Padalia, Nandhinee Periyakaruppa, Oindrila Saha, Adina Williams, Adriana Romero-Soriano, Megan Richards, Polina Kirichenko, Melissa Hall,
- Abstract summary: We introduce a new set of metrics, Decomposed Indicators of Disparities in Image Generation (Decomposed-DIG), that allows us to measure geographic disparities in the depiction of objects and backgrounds in generated images.
Using Decomposed-DIG, we audit a widely used latent diffusion model and find that generated images depict objects with better realism than backgrounds.
We use Decomposed-DIG to pinpoint specific examples of disparities, such as stereotypical background generation in Africa, struggling to generate modern vehicles in Africa, and unrealistically placing some objects in outdoor settings.
- Score: 22.491466809896867
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recent work has identified substantial disparities in generated images of different geographic regions, including stereotypical depictions of everyday objects like houses and cars. However, existing measures for these disparities have been limited to either human evaluations, which are time-consuming and costly, or automatic metrics evaluating full images, which are unable to attribute these disparities to specific parts of the generated images. In this work, we introduce a new set of metrics, Decomposed Indicators of Disparities in Image Generation (Decomposed-DIG), that allows us to separately measure geographic disparities in the depiction of objects and backgrounds in generated images. Using Decomposed-DIG, we audit a widely used latent diffusion model and find that generated images depict objects with better realism than backgrounds and that backgrounds in generated images tend to contain larger regional disparities than objects. We use Decomposed-DIG to pinpoint specific examples of disparities, such as stereotypical background generation in Africa, struggling to generate modern vehicles in Africa, and unrealistically placing some objects in outdoor settings. Informed by our metric, we use a new prompting structure that enables a 52% worst-region improvement and a 20% average improvement in generated background diversity.
Related papers
- Improving Geo-diversity of Generated Images with Contextualized Vendi Score Guidance [12.33170407159189]
State-of-the-art text-to-image generative models struggle to depict everyday objects with the true diversity of the real world.
We introduce an inference time intervention, contextualized Vendi Score Guidance (c-VSG), that guides the backwards steps of latent diffusion models to increase the diversity of a sample.
We find that c-VSG substantially increases the diversity of generated images, both for the worst performing regions and on average, while simultaneously maintaining or improving image quality and consistency.
arXiv Detail & Related papers (2024-06-06T23:35:51Z) - Towards Geographic Inclusion in the Evaluation of Text-to-Image Models [25.780536950323683]
We study how much annotators in Africa, Europe, and Southeast Asia vary in their perception of geographic representation, visual appeal, and consistency in real and generated images.
For example, annotators in different locations often disagree on whether exaggerated, stereotypical depictions of a region are considered geographically representative.
We recommend steps for improved automatic and human evaluations.
arXiv Detail & Related papers (2024-05-07T16:23:06Z) - ObjectCompose: Evaluating Resilience of Vision-Based Models on Object-to-Background Compositional Changes [64.57705752579207]
We evaluate the resilience of vision-based models against diverse object-to-background context variations.
We harness the generative capabilities of text-to-image, image-to-text, and image-to-segment models to automatically generate object-to-background changes.
arXiv Detail & Related papers (2024-03-07T17:48:48Z) - Classification for everyone : Building geography agnostic models for fairer recognition [0.9558392439655016]
We quantitatively present this bias in two datasets - The Dollar Street dataset and ImageNet.
We then present different methods which can be employed to reduce this bias.
arXiv Detail & Related papers (2023-12-05T18:41:03Z) - Improving Diversity in Zero-Shot GAN Adaptation with Semantic Variations [61.132408427908175]
zero-shot GAN adaptation aims to reuse well-trained generators to synthesize images of an unseen target domain.
With only a single representative text feature instead of real images, the synthesized images gradually lose diversity.
We propose a novel method to find semantic variations of the target text in the CLIP space.
arXiv Detail & Related papers (2023-08-21T08:12:28Z) - Flickr Africa: Examining Geo-Diversity in Large-Scale, Human-Centric
Visual Data [3.4022338837261525]
We analyze human-centric image geo-diversity on a massive scale using geotagged Flickr images associated with each nation in Africa.
We report the quantity and content of available data with comparisons to population-matched nations in Europe.
We present findings for an othering'' phenomenon as evidenced by a substantial number of images from Africa being taken by non-local photographers.
arXiv Detail & Related papers (2023-08-16T20:12:01Z) - Inspecting the Geographical Representativeness of Images from
Text-to-Image Models [52.80961012689933]
We measure the geographical representativeness of generated images using a crowdsourced study comprising 540 participants across 27 countries.
For deliberately underspecified inputs without country names, the generated images most reflect the surroundings of the United States followed by India.
The overall scores for many countries still remain low, highlighting the need for future models to be more geographically inclusive.
arXiv Detail & Related papers (2023-05-18T16:08:11Z) - Mitigating Urban-Rural Disparities in Contrastive Representation Learning with Satellite Imagery [19.93324644519412]
We consider the risk of urban-rural disparities in identification of land-cover features.
We propose fair dense representation with contrastive learning (FairDCL) as a method for de-biasing the multi-level latent space of convolution neural network models.
The obtained image representation mitigates downstream urban-rural prediction disparities and outperforms state-of-the-art baselines on real-world satellite images.
arXiv Detail & Related papers (2022-11-16T04:59:46Z) - Few-shot Image Generation via Cross-domain Correspondence [98.2263458153041]
Training generative models, such as GANs, on a target domain containing limited examples can easily result in overfitting.
In this work, we seek to utilize a large source domain for pretraining and transfer the diversity information from source to target.
To further reduce overfitting, we present an anchor-based strategy to encourage different levels of realism over different regions in the latent space.
arXiv Detail & Related papers (2021-04-13T17:59:35Z) - PhraseCut: Language-based Image Segmentation in the Wild [62.643450401286]
We consider the problem of segmenting image regions given a natural language phrase.
Our dataset is collected on top of the Visual Genome dataset.
Our experiments show that the scale and diversity of concepts in our dataset poses significant challenges to the existing state-of-the-art.
arXiv Detail & Related papers (2020-08-03T20:58:53Z) - Image Fine-grained Inpainting [89.17316318927621]
We present a one-stage model that utilizes dense combinations of dilated convolutions to obtain larger and more effective receptive fields.
To better train this efficient generator, except for frequently-used VGG feature matching loss, we design a novel self-guided regression loss.
We also employ a discriminator with local and global branches to ensure local-global contents consistency.
arXiv Detail & Related papers (2020-02-07T03:45:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.