Inspecting the Geographical Representativeness of Images from
Text-to-Image Models
- URL: http://arxiv.org/abs/2305.11080v1
- Date: Thu, 18 May 2023 16:08:11 GMT
- Title: Inspecting the Geographical Representativeness of Images from
Text-to-Image Models
- Authors: Abhipsa Basu, R. Venkatesh Babu and Danish Pruthi
- Abstract summary: We measure the geographical representativeness of generated images using a crowdsourced study comprising 540 participants across 27 countries.
For deliberately underspecified inputs without country names, the generated images most reflect the surroundings of the United States followed by India.
The overall scores for many countries still remain low, highlighting the need for future models to be more geographically inclusive.
- Score: 52.80961012689933
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Recent progress in generative models has resulted in models that produce both
realistic as well as relevant images for most textual inputs. These models are
being used to generate millions of images everyday, and hold the potential to
drastically impact areas such as generative art, digital marketing and data
augmentation. Given their outsized impact, it is important to ensure that the
generated content reflects the artifacts and surroundings across the globe,
rather than over-representing certain parts of the world. In this paper, we
measure the geographical representativeness of common nouns (e.g., a house)
generated through DALL.E 2 and Stable Diffusion models using a crowdsourced
study comprising 540 participants across 27 countries. For deliberately
underspecified inputs without country names, the generated images most reflect
the surroundings of the United States followed by India, and the top
generations rarely reflect surroundings from all other countries (average score
less than 3 out of 5). Specifying the country names in the input increases the
representativeness by 1.44 points on average for DALL.E 2 and 0.75 for Stable
Diffusion, however, the overall scores for many countries still remain low,
highlighting the need for future models to be more geographically inclusive.
Lastly, we examine the feasibility of quantifying the geographical
representativeness of generated images without conducting user studies.
Related papers
- Decomposed evaluations of geographic disparities in text-to-image models [22.491466809896867]
We introduce a new set of metrics, Decomposed Indicators of Disparities in Image Generation (Decomposed-DIG), that allows us to measure geographic disparities in the depiction of objects and backgrounds in generated images.
Using Decomposed-DIG, we audit a widely used latent diffusion model and find that generated images depict objects with better realism than backgrounds.
We use Decomposed-DIG to pinpoint specific examples of disparities, such as stereotypical background generation in Africa, struggling to generate modern vehicles in Africa, and unrealistically placing some objects in outdoor settings.
arXiv Detail & Related papers (2024-06-17T18:04:23Z) - You are what you eat? Feeding foundation models a regionally diverse food dataset of World Wide Dishes [3.1402605498916514]
We present World Wide Dishes, a mixed text and image dataset consisting of 765 dishes, with dish names collected in 131 local languages.
We demonstrate a novel means of operationalising capability and representational biases in foundation models such as language models and text-to-image generative models.
We find that these models generally do not produce quality text and image outputs of dishes specific to different regions.
arXiv Detail & Related papers (2024-06-13T18:00:00Z) - Improving Geo-diversity of Generated Images with Contextualized Vendi Score Guidance [12.33170407159189]
State-of-the-art text-to-image generative models struggle to depict everyday objects with the true diversity of the real world.
We introduce an inference time intervention, contextualized Vendi Score Guidance (c-VSG), that guides the backwards steps of latent diffusion models to increase the diversity of a sample.
We find that c-VSG substantially increases the diversity of generated images, both for the worst performing regions and on average, while simultaneously maintaining or improving image quality and consistency.
arXiv Detail & Related papers (2024-06-06T23:35:51Z) - Flickr Africa: Examining Geo-Diversity in Large-Scale, Human-Centric
Visual Data [3.4022338837261525]
We analyze human-centric image geo-diversity on a massive scale using geotagged Flickr images associated with each nation in Africa.
We report the quantity and content of available data with comparisons to population-matched nations in Europe.
We present findings for an othering'' phenomenon as evidenced by a substantial number of images from Africa being taken by non-local photographers.
arXiv Detail & Related papers (2023-08-16T20:12:01Z) - Social Biases through the Text-to-Image Generation Lens [9.137275391251517]
Text-to-Image (T2I) generation is enabling new applications that support creators, designers, and general end users of productivity software.
We take a multi-dimensional approach to studying and quantifying common social biases as reflected in the generated images.
We present findings for two popular T2I models: DALLE-v2 and Stable Diffusion.
arXiv Detail & Related papers (2023-03-30T05:29:13Z) - Activation Regression for Continuous Domain Generalization with
Applications to Crop Classification [48.795866501365694]
Geographic variance in satellite imagery impacts the ability of machine learning models to generalise to new regions.
We model geographic generalisation in medium resolution Landsat-8 satellite imagery as a continuous domain adaptation problem.
We develop a dataset spatially distributed across the entire continental United States.
arXiv Detail & Related papers (2022-04-14T15:41:39Z) - There is a Time and Place for Reasoning Beyond the Image [63.96498435923328]
Images often more significant than only the pixels to human eyes, as we can infer, associate, and reason with contextual information from other sources to establish a more complete picture.
We introduce TARA: a dataset with 16k images with their associated news, time and location automatically extracted from New York Times (NYT), and an additional 61k examples as distant supervision from WIT.
We show that there exists a 70% gap between a state-of-the-art joint model and human performance, which is slightly filled by our proposed model that uses segment-wise reasoning, motivating higher-level vision-language joint models that
arXiv Detail & Related papers (2022-03-01T21:52:08Z) - DALL-Eval: Probing the Reasoning Skills and Social Biases of
Text-to-Image Generation Models [73.12069620086311]
We investigate the visual reasoning capabilities and social biases of text-to-image models.
First, we measure three visual reasoning skills: object recognition, object counting, and spatial relation understanding.
Second, we assess the gender and skin tone biases by measuring the gender/skin tone distribution of generated images.
arXiv Detail & Related papers (2022-02-08T18:36:52Z) - Generating Physically-Consistent Satellite Imagery for Climate Visualizations [53.61991820941501]
We train a generative adversarial network to create synthetic satellite imagery of future flooding and reforestation events.
A pure deep learning-based model can generate flood visualizations but hallucinates floods at locations that were not susceptible to flooding.
We publish our code and dataset for segmentation guided image-to-image translation in Earth observation.
arXiv Detail & Related papers (2021-04-10T15:00:15Z) - Predicting Livelihood Indicators from Community-Generated Street-Level
Imagery [70.5081240396352]
We propose an inexpensive, scalable, and interpretable approach to predict key livelihood indicators from public crowd-sourced street-level imagery.
By comparing our results against ground data collected in nationally-representative household surveys, we demonstrate the performance of our approach in accurately predicting indicators of poverty, population, and health.
arXiv Detail & Related papers (2020-06-15T18:12:12Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.