ImageCaptioner$^2$: Image Captioner for Image Captioning Bias
Amplification Assessment
- URL: http://arxiv.org/abs/2304.04874v2
- Date: Mon, 5 Jun 2023 22:06:07 GMT
- Title: ImageCaptioner$^2$: Image Captioner for Image Captioning Bias
Amplification Assessment
- Authors: Eslam Mohamed Bakr, Pengzhan Sun, Li Erran Li, Mohamed Elhoseiny
- Abstract summary: We introduce a new bias assessment metric, dubbed $ImageCaptioner2$, for image captioning.
Instead of measuring the absolute bias in the model or the data, $ImageCaptioner2$ pay more attention to the bias introduced by the model w.r.t the data bias.
In addition, we design a formulation for measuring the bias of generated captions as prompt-based image captioning.
- Score: 30.71835197717301
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Most pre-trained learning systems are known to suffer from bias, which
typically emerges from the data, the model, or both. Measuring and quantifying
bias and its sources is a challenging task and has been extensively studied in
image captioning. Despite the significant effort in this direction, we observed
that existing metrics lack consistency in the inclusion of the visual signal.
In this paper, we introduce a new bias assessment metric, dubbed
$ImageCaptioner^2$, for image captioning. Instead of measuring the absolute
bias in the model or the data, $ImageCaptioner^2$ pay more attention to the
bias introduced by the model w.r.t the data bias, termed bias amplification.
Unlike the existing methods, which only evaluate the image captioning
algorithms based on the generated captions only, $ImageCaptioner^2$
incorporates the image while measuring the bias. In addition, we design a
formulation for measuring the bias of generated captions as prompt-based image
captioning instead of using language classifiers. Finally, we apply our
$ImageCaptioner^2$ metric across 11 different image captioning architectures on
three different datasets, i.e., MS-COCO caption dataset, Artemis V1, and
Artemis V2, and on three different protected attributes, i.e., gender, race,
and emotions. Consequently, we verify the effectiveness of our
$ImageCaptioner^2$ metric by proposing AnonymousBench, which is a novel human
evaluation paradigm for bias metrics. Our metric shows significant superiority
over the recent bias metric; LIC, in terms of human alignment, where the
correlation scores are 80% and 54% for our metric and LIC, respectively. The
code is available at https://eslambakr.github.io/imagecaptioner2.github.io/.
Related papers
- Measuring directional bias amplification in image captions using predictability [13.041091740013808]
We propose Directional Predictability Amplification in Captioning (DPAC) to measure bias amplification in ML datasets.
DPAC measures directional bias amplification in captions, provides a better estimate of dataset bias, and is less sensitive to attacker models.
Our experiments on the COCO captioning dataset show how DPAC is the most reliable metric to measure bias amplification in captions.
arXiv Detail & Related papers (2025-03-10T21:50:58Z) - Guiding Image Captioning Models Toward More Specific Captions [32.36062034676917]
We show that it is possible to generate more specific captions with minimal changes to the training process.
We implement classifier-free guidance for an autoregressive captioning model by fine-tuning it to estimate both conditional and unconditional distributions over captions.
arXiv Detail & Related papers (2023-07-31T14:00:12Z) - Balancing the Picture: Debiasing Vision-Language Datasets with Synthetic
Contrast Sets [52.77024349608834]
Vision-language models can perpetuate and amplify societal biases learned during pre-training on uncurated image-text pairs from the internet.
COCO Captions is the most commonly used dataset for evaluating bias between background context and the gender of people in-situ.
We propose a novel dataset debiasing pipeline to augment the COCO dataset with synthetic, gender-balanced contrast sets.
arXiv Detail & Related papers (2023-05-24T17:59:18Z) - Mitigating Test-Time Bias for Fair Image Retrieval [18.349154934096784]
We address the challenge of generating fair and unbiased image retrieval results given neutral textual queries.
We introduce a straightforward technique, Post-hoc Bias Mitigation, that post-processes the outputs from the pre-trained vision-language model.
Our approach achieves the lowest bias, compared with various existing bias-mitigation methods, in text-based image retrieval result.
arXiv Detail & Related papers (2023-05-23T21:31:16Z) - InfoMetIC: An Informative Metric for Reference-free Image Caption
Evaluation [69.1642316502563]
We propose an Informative Metric for Reference-free Image Caption evaluation (InfoMetIC)
Given an image and a caption, InfoMetIC is able to report incorrect words and unmentioned image regions at fine-grained level.
We also construct a token-level evaluation dataset and demonstrate the effectiveness of InfoMetIC in fine-grained evaluation.
arXiv Detail & Related papers (2023-05-10T09:22:44Z) - Are metrics measuring what they should? An evaluation of image
captioning task metrics [0.21301560294088315]
Image Captioning is a current research task to describe the image content using the objects and their relationships in the scene.
To tackle this task, two important research areas are used, artificial vision, and natural language processing.
We present an evaluation of several kinds of Image Captioning metrics and a comparison between them using the well-known MS COCO dataset.
arXiv Detail & Related papers (2022-07-04T21:51:47Z) - On Distinctive Image Captioning via Comparing and Reweighting [52.3731631461383]
In this paper, we aim to improve the distinctiveness of image captions via comparing and reweighting with a set of similar images.
Our metric reveals that the human annotations of each image in the MSCOCO dataset are not equivalent based on distinctiveness.
In contrast, previous works normally treat the human annotations equally during training, which could be a reason for generating less distinctive captions.
arXiv Detail & Related papers (2022-04-08T08:59:23Z) - Transparent Human Evaluation for Image Captioning [70.03979566548823]
We develop a rubric-based human evaluation protocol for image captioning models.
We show that human-generated captions show substantially higher quality than machine-generated ones.
We hope that this work will promote a more transparent evaluation protocol for image captioning.
arXiv Detail & Related papers (2021-11-17T07:09:59Z) - Can Audio Captions Be Evaluated with Image Caption Metrics? [11.45508807551818]
We propose a metric named FENSE, where we combine the strength of Sentence-BERT in capturing similarity, and a novel Error Detector to penalize erroneous sentences for robustness.
On the newly established benchmarks, FENSE outperforms current metrics by 14-25% accuracy.
arXiv Detail & Related papers (2021-10-10T02:34:40Z) - Contrastive Semantic Similarity Learning for Image Captioning Evaluation
with Intrinsic Auto-encoder [52.42057181754076]
Motivated by the auto-encoder mechanism and contrastive representation learning advances, we propose a learning-based metric for image captioning.
We develop three progressive model structures to learn the sentence level representations.
Experiment results show that our proposed method can align well with the scores generated from other contemporary metrics.
arXiv Detail & Related papers (2021-06-29T12:27:05Z) - Intrinsic Image Captioning Evaluation [53.51379676690971]
We propose a learning based metrics for image captioning, which we call Intrinsic Image Captioning Evaluation(I2CE)
Experiment results show that our proposed method can keep robust performance and give more flexible scores to candidate captions when encountered with semantic similar expression or less aligned semantics.
arXiv Detail & Related papers (2020-12-14T08:36:05Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.