Understanding and Evaluating Racial Biases in Image Captioning
- URL: http://arxiv.org/abs/2106.08503v1
- Date: Wed, 16 Jun 2021 01:07:24 GMT
- Title: Understanding and Evaluating Racial Biases in Image Captioning
- Authors: Dora Zhao and Angelina Wang and Olga Russakovsky
- Abstract summary: We study bias propagation pathways within image captioning, focusing specifically on the COCO dataset.
We demonstrate differences in caption performance, sentiment, and word choice between images of lighter versus darker-skinned people.
- Score: 18.184279793253634
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Image captioning is an important task for benchmarking visual reasoning and
for enabling accessibility for people with vision impairments. However, as in
many machine learning settings, social biases can influence image captioning in
undesirable ways. In this work, we study bias propagation pathways within image
captioning, focusing specifically on the COCO dataset. Prior work has analyzed
gender bias in captions using automatically-derived gender labels; here we
examine racial and intersectional biases using manual annotations. Our first
contribution is in annotating the perceived gender and skin color of 28,315 of
the depicted people after obtaining IRB approval. Using these annotations, we
compare racial biases present in both manual and automatically-generated image
captions. We demonstrate differences in caption performance, sentiment, and
word choice between images of lighter versus darker-skinned people. Further, we
find the magnitude of these differences to be greater in modern captioning
systems compared to older ones, thus leading to concerns that without proper
consideration and mitigation these differences will only become increasingly
prevalent. Code and data is available at
https://princetonvisualai.github.io/imagecaptioning-bias .
Related papers
- From Descriptive Richness to Bias: Unveiling the Dark Side of Generative Image Caption Enrichment [26.211648382676856]
Large language models (LLMs) have enhanced the capacity of vision-language models to caption visual text.
We show that enriched captions suffer from increased gender bias and hallucination.
This study serves as a caution against the trend of making descriptive captions more descriptive.
arXiv Detail & Related papers (2024-06-20T01:03:13Z) - Targeted Image Data Augmentation Increases Basic Skills Captioning
Robustness [0.932065750652415]
TIDA (Targeted Image-editing Data Augmentation) is a targeted data augmentation method focused on improving models' human-like abilities.
We show that a TIDA-enhanced dataset related to gender, color, and counting abilities induces better performance in several image captioning metrics.
arXiv Detail & Related papers (2023-09-27T20:12:41Z) - VisoGender: A dataset for benchmarking gender bias in image-text pronoun
resolution [80.57383975987676]
VisoGender is a novel dataset for benchmarking gender bias in vision-language models.
We focus on occupation-related biases within a hegemonic system of binary gender, inspired by Winograd and Winogender schemas.
We benchmark several state-of-the-art vision-language models and find that they demonstrate bias in resolving binary gender in complex scenes.
arXiv Detail & Related papers (2023-06-21T17:59:51Z) - Balancing the Picture: Debiasing Vision-Language Datasets with Synthetic
Contrast Sets [52.77024349608834]
Vision-language models can perpetuate and amplify societal biases learned during pre-training on uncurated image-text pairs from the internet.
COCO Captions is the most commonly used dataset for evaluating bias between background context and the gender of people in-situ.
We propose a novel dataset debiasing pipeline to augment the COCO dataset with synthetic, gender-balanced contrast sets.
arXiv Detail & Related papers (2023-05-24T17:59:18Z) - ImageCaptioner$^2$: Image Captioner for Image Captioning Bias
Amplification Assessment [30.71835197717301]
We introduce a new bias assessment metric, dubbed $ImageCaptioner2$, for image captioning.
Instead of measuring the absolute bias in the model or the data, $ImageCaptioner2$ pay more attention to the bias introduced by the model w.r.t the data bias.
In addition, we design a formulation for measuring the bias of generated captions as prompt-based image captioning.
arXiv Detail & Related papers (2023-04-10T21:40:46Z) - Cross-Domain Image Captioning with Discriminative Finetuning [20.585138136033905]
Fine-tuning an out-of-the-box neural captioner with a self-supervised discriminative communication objective helps to recover a plain, visually descriptive language.
We show that discriminatively finetuned captions are more helpful than either vanilla ClipCap captions or ground-truth captions for human annotators tasked with an image discrimination task.
arXiv Detail & Related papers (2023-04-04T09:33:16Z) - Auditing Gender Presentation Differences in Text-to-Image Models [54.16959473093973]
We study how gender is presented differently in text-to-image models.
By probing gender indicators in the input text, we quantify the frequency differences of presentation-centric attributes.
We propose an automatic method to estimate such differences.
arXiv Detail & Related papers (2023-02-07T18:52:22Z) - Discovering and Mitigating Visual Biases through Keyword Explanation [66.71792624377069]
We propose the Bias-to-Text (B2T) framework, which interprets visual biases as keywords.
B2T can identify known biases, such as gender bias in CelebA, background bias in Waterbirds, and distribution shifts in ImageNet-R/C.
B2T uncovers novel biases in larger datasets, such as Dollar Street and ImageNet.
arXiv Detail & Related papers (2023-01-26T13:58:46Z) - On Distinctive Image Captioning via Comparing and Reweighting [52.3731631461383]
In this paper, we aim to improve the distinctiveness of image captions via comparing and reweighting with a set of similar images.
Our metric reveals that the human annotations of each image in the MSCOCO dataset are not equivalent based on distinctiveness.
In contrast, previous works normally treat the human annotations equally during training, which could be a reason for generating less distinctive captions.
arXiv Detail & Related papers (2022-04-08T08:59:23Z) - Transparent Human Evaluation for Image Captioning [70.03979566548823]
We develop a rubric-based human evaluation protocol for image captioning models.
We show that human-generated captions show substantially higher quality than machine-generated ones.
We hope that this work will promote a more transparent evaluation protocol for image captioning.
arXiv Detail & Related papers (2021-11-17T07:09:59Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.