COSMic: A Coherence-Aware Generation Metric for Image Descriptions
- URL: http://arxiv.org/abs/2109.05281v1
- Date: Sat, 11 Sep 2021 13:43:36 GMT
- Title: COSMic: A Coherence-Aware Generation Metric for Image Descriptions
- Authors: Mert \.Inan, Piyush Sharma, Baber Khalid, Radu Soricut, Matthew Stone,
Malihe Alikhani
- Abstract summary: Image metrics have struggled to give accurate learned estimates of the semantic and pragmatic success of text evaluation models.
We present the first learned generation metric for evaluating output captions.
We demonstrate a higher out-efficient for our proposed metric the human judgments for the results of a number of state-of-the-art caption models when compared to several other metrics such as BLEURT and BERT.
- Score: 27.41088864449921
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Developers of text generation models rely on automated evaluation metrics as
a stand-in for slow and expensive manual evaluations. However, image captioning
metrics have struggled to give accurate learned estimates of the semantic and
pragmatic success of output text. We address this weakness by introducing the
first discourse-aware learned generation metric for evaluating image
descriptions. Our approach is inspired by computational theories of discourse
for capturing information goals using coherence. We present a dataset of
image$\unicode{x2013}$description pairs annotated with coherence relations. We
then train a coherence-aware metric on a subset of the Conceptual Captions
dataset and measure its effectiveness$\unicode{x2014}$its ability to predict
human ratings of output captions$\unicode{x2014}$on a test set composed of
out-of-domain images. We demonstrate a higher Kendall Correlation Coefficient
for our proposed metric with the human judgments for the results of a number of
state-of-the-art coherence-aware caption generation models when compared to
several other metrics including recently proposed learned metrics such as
BLEURT and BERTScore.
Related papers
- BRIDGE: Bridging Gaps in Image Captioning Evaluation with Stronger Visual Cues [47.213906345208315]
We propose BRIDGE, a new learnable and reference-free image captioning metric.
Our proposal achieves state-of-the-art results compared to existing reference-free evaluation scores.
arXiv Detail & Related papers (2024-07-29T18:00:17Z) - Cobra Effect in Reference-Free Image Captioning Metrics [58.438648377314436]
A proliferation of reference-free methods, leveraging visual-language pre-trained models (VLMs), has emerged.
In this paper, we study if there are any deficiencies in reference-free metrics.
We employ GPT-4V as an evaluative tool to assess generated sentences and the result reveals that our approach achieves state-of-the-art (SOTA) performance.
arXiv Detail & Related papers (2024-02-18T12:36:23Z) - InfoMetIC: An Informative Metric for Reference-free Image Caption
Evaluation [69.1642316502563]
We propose an Informative Metric for Reference-free Image Caption evaluation (InfoMetIC)
Given an image and a caption, InfoMetIC is able to report incorrect words and unmentioned image regions at fine-grained level.
We also construct a token-level evaluation dataset and demonstrate the effectiveness of InfoMetIC in fine-grained evaluation.
arXiv Detail & Related papers (2023-05-10T09:22:44Z) - Positive-Augmented Contrastive Learning for Image and Video Captioning
Evaluation [47.40949434032489]
We propose a new contrastive-based evaluation metric for image captioning, namely Positive-Augmented Contrastive learning Score (PAC-S)
PAC-S unifies the learning of a contrastive visual-semantic space with the addition of generated images and text on curated data.
Experiments spanning several datasets demonstrate that our new metric achieves the highest correlation with human judgments on both images and videos.
arXiv Detail & Related papers (2023-03-21T18:03:14Z) - Transparent Human Evaluation for Image Captioning [70.03979566548823]
We develop a rubric-based human evaluation protocol for image captioning models.
We show that human-generated captions show substantially higher quality than machine-generated ones.
We hope that this work will promote a more transparent evaluation protocol for image captioning.
arXiv Detail & Related papers (2021-11-17T07:09:59Z) - Contrastive Semantic Similarity Learning for Image Captioning Evaluation
with Intrinsic Auto-encoder [52.42057181754076]
Motivated by the auto-encoder mechanism and contrastive representation learning advances, we propose a learning-based metric for image captioning.
We develop three progressive model structures to learn the sentence level representations.
Experiment results show that our proposed method can align well with the scores generated from other contemporary metrics.
arXiv Detail & Related papers (2021-06-29T12:27:05Z) - Intrinsic Image Captioning Evaluation [53.51379676690971]
We propose a learning based metrics for image captioning, which we call Intrinsic Image Captioning Evaluation(I2CE)
Experiment results show that our proposed method can keep robust performance and give more flexible scores to candidate captions when encountered with semantic similar expression or less aligned semantics.
arXiv Detail & Related papers (2020-12-14T08:36:05Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.