Are metrics measuring what they should? An evaluation of image
captioning task metrics
- URL: http://arxiv.org/abs/2207.01733v1
- Date: Mon, 4 Jul 2022 21:51:47 GMT
- Title: Are metrics measuring what they should? An evaluation of image
captioning task metrics
- Authors: Oth\'on Gonz\'alez-Ch\'avez, Guillermo Ruiz, Daniela Moctezuma, Tania
A. Ramirez-delReal
- Abstract summary: Image Captioning is a current research task to describe the image content using the objects and their relationships in the scene.
To tackle this task, two important research areas are used, artificial vision, and natural language processing.
We present an evaluation of several kinds of Image Captioning metrics and a comparison between them using the well-known MS COCO dataset.
- Score: 0.21301560294088315
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Image Captioning is a current research task to describe the image content
using the objects and their relationships in the scene. To tackle this task,
two important research areas are used, artificial vision, and natural language
processing. In Image Captioning, as in any computational intelligence task, the
performance metrics are crucial for knowing how well (or bad) a method
performs. In recent years, it has been observed that classical metrics based on
n-grams are insufficient to capture the semantics and the critical meaning to
describe the content in an image. Looking to measure how well or not the set of
current and more recent metrics are doing, in this manuscript, we present an
evaluation of several kinds of Image Captioning metrics and a comparison
between them using the well-known MS COCO dataset. For this, we designed two
scenarios; 1) a set of artificially build captions with several quality, and 2)
a comparison of some state-of-the-art Image Captioning methods. We tried to
answer the questions: Are the current metrics helping to produce high quality
captions? How do actual metrics compare to each other? What are the metrics
really measuring?
Related papers
- InfoMetIC: An Informative Metric for Reference-free Image Caption
Evaluation [69.1642316502563]
We propose an Informative Metric for Reference-free Image Caption evaluation (InfoMetIC)
Given an image and a caption, InfoMetIC is able to report incorrect words and unmentioned image regions at fine-grained level.
We also construct a token-level evaluation dataset and demonstrate the effectiveness of InfoMetIC in fine-grained evaluation.
arXiv Detail & Related papers (2023-05-10T09:22:44Z) - NewsStories: Illustrating articles with visual summaries [49.924916589209374]
We introduce a large-scale multimodal dataset containing over 31M articles, 22M images and 1M videos.
We show that state-of-the-art image-text alignment methods are not robust to longer narratives with multiple images.
We introduce an intuitive baseline that outperforms these methods on zero-shot image-set retrieval by 10% on the GoodNews dataset.
arXiv Detail & Related papers (2022-07-26T17:34:11Z) - Knowledge Mining with Scene Text for Fine-Grained Recognition [53.74297368412834]
We propose an end-to-end trainable network that mines implicit contextual knowledge behind scene text image.
We employ KnowBert to retrieve relevant knowledge for semantic representation and combine it with image features for fine-grained classification.
Our method outperforms the state-of-the-art by 3.72% mAP and 5.39% mAP, respectively.
arXiv Detail & Related papers (2022-03-27T05:54:00Z) - Transparent Human Evaluation for Image Captioning [70.03979566548823]
We develop a rubric-based human evaluation protocol for image captioning models.
We show that human-generated captions show substantially higher quality than machine-generated ones.
We hope that this work will promote a more transparent evaluation protocol for image captioning.
arXiv Detail & Related papers (2021-11-17T07:09:59Z) - Is An Image Worth Five Sentences? A New Look into Semantics for
Image-Text Matching [10.992151305603267]
We propose two metrics that evaluate the degree of semantic relevance of retrieved items, independently of their annotated binary relevance.
We incorporate a novel strategy that uses an image captioning metric, CIDEr, to define a Semantic Adaptive Margin (SAM) to be optimized in a standard triplet loss.
arXiv Detail & Related papers (2021-10-06T09:54:28Z) - Contrastive Semantic Similarity Learning for Image Captioning Evaluation
with Intrinsic Auto-encoder [52.42057181754076]
Motivated by the auto-encoder mechanism and contrastive representation learning advances, we propose a learning-based metric for image captioning.
We develop three progressive model structures to learn the sentence level representations.
Experiment results show that our proposed method can align well with the scores generated from other contemporary metrics.
arXiv Detail & Related papers (2021-06-29T12:27:05Z) - Intrinsic Image Captioning Evaluation [53.51379676690971]
We propose a learning based metrics for image captioning, which we call Intrinsic Image Captioning Evaluation(I2CE)
Experiment results show that our proposed method can keep robust performance and give more flexible scores to candidate captions when encountered with semantic similar expression or less aligned semantics.
arXiv Detail & Related papers (2020-12-14T08:36:05Z) - Evaluating Automatically Generated Phoneme Captions for Images [44.20957732654963]
Image2Speech is the relatively new task of generating a spoken description of an image.
This paper presents an investigation into the evaluation of this task.
BLEU4 is the best currently existing metric for the Image2Speech task.
arXiv Detail & Related papers (2020-07-31T09:21:13Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.