Belief Revision based Caption Re-ranker with Visual Semantic Information
- URL: http://arxiv.org/abs/2209.08163v1
- Date: Fri, 16 Sep 2022 20:36:41 GMT
- Title: Belief Revision based Caption Re-ranker with Visual Semantic Information
- Authors: Ahmed Sabir, Francesc Moreno-Noguer, Pranava Madhyastha, Llu\'is
Padr\'o
- Abstract summary: We propose a novel re-ranking approach that leverages visual-semantic measures to identify the ideal caption.
Our experiments demonstrate the utility of our approach, where we observe that our re-ranker can enhance the performance of a typical image-captioning system.
- Score: 31.20692237930281
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In this work, we focus on improving the captions generated by image-caption
generation systems. We propose a novel re-ranking approach that leverages
visual-semantic measures to identify the ideal caption that maximally captures
the visual information in the image. Our re-ranker utilizes the Belief Revision
framework (Blok et al., 2003) to calibrate the original likelihood of the top-n
captions by explicitly exploiting the semantic relatedness between the depicted
caption and the visual context. Our experiments demonstrate the utility of our
approach, where we observe that our re-ranker can enhance the performance of a
typical image-captioning system without the necessity of any additional
training or fine-tuning.
Related papers
- Towards Retrieval-Augmented Architectures for Image Captioning [81.11529834508424]
This work presents a novel approach towards developing image captioning models that utilize an external kNN memory to improve the generation process.
Specifically, we propose two model variants that incorporate a knowledge retriever component that is based on visual similarities.
We experimentally validate our approach on COCO and nocaps datasets and demonstrate that incorporating an explicit external memory can significantly enhance the quality of captions.
arXiv Detail & Related papers (2024-05-21T18:02:07Z) - What Makes for Good Image Captions? [50.48589893443939]
Our framework posits that good image captions should balance three key aspects: informationally sufficient, minimally redundant, and readily comprehensible by humans.
We introduce the Pyramid of Captions (PoCa) method, which generates enriched captions by integrating local and global visual information.
arXiv Detail & Related papers (2024-05-01T12:49:57Z) - Word to Sentence Visual Semantic Similarity for Caption Generation:
Lessons Learned [2.1828601975620257]
We propose an approach for improving caption generation systems by choosing the most closely related output to the image.
We employ a visual semantic measure in a word and sentence level manner to match the proper caption to the related information in the image.
arXiv Detail & Related papers (2022-09-26T16:24:13Z) - Injecting Semantic Concepts into End-to-End Image Captioning [61.41154537334627]
We propose a pure vision transformer-based image captioning model, dubbed as ViTCAP, in which grid representations are used without extracting the regional features.
For improved performance, we introduce a novel Concept Token Network (CTN) to predict the semantic concepts and then incorporate them into the end-to-end captioning.
In particular, the CTN is built on the basis of a vision transformer and is designed to predict the concept tokens through a classification task.
arXiv Detail & Related papers (2021-12-09T22:05:05Z) - Exploring Semantic Relationships for Unpaired Image Captioning [40.401322131624866]
We achieve unpaired image captioning by bridging the vision and the language domains with high-level semantic information.
We propose the Semantic Relationship Explorer, which explores the relationships between semantic concepts for better understanding of the image.
The proposed approach boosts five strong baselines under the paired setting, where the most significant improvement in CIDEr score reaches 8%.
arXiv Detail & Related papers (2021-06-20T09:10:11Z) - Intrinsic Image Captioning Evaluation [53.51379676690971]
We propose a learning based metrics for image captioning, which we call Intrinsic Image Captioning Evaluation(I2CE)
Experiment results show that our proposed method can keep robust performance and give more flexible scores to candidate captions when encountered with semantic similar expression or less aligned semantics.
arXiv Detail & Related papers (2020-12-14T08:36:05Z) - Towards Unique and Informative Captioning of Images [40.036350846970706]
We analyze both modern captioning systems and evaluation metrics.
We design a new metric (SPICE) by introducing a notion of uniqueness over the concepts generated in a caption.
We show that SPICE-U is better correlated with human judgements compared to SPICE, and effectively captures notions of diversity and descriptiveness.
arXiv Detail & Related papers (2020-09-08T19:01:33Z) - Improving Image Captioning with Better Use of Captions [65.39641077768488]
We present a novel image captioning architecture to better explore semantics available in captions and leverage that to enhance both image representation and caption generation.
Our models first construct caption-guided visual relationship graphs that introduce beneficial inductive bias using weakly supervised multi-instance learning.
During generation, the model further incorporates visual relationships using multi-task learning for jointly predicting word and object/predicate tag sequences.
arXiv Detail & Related papers (2020-06-21T14:10:47Z) - Egoshots, an ego-vision life-logging dataset and semantic fidelity
metric to evaluate diversity in image captioning models [63.11766263832545]
We present a new image captioning dataset, Egoshots, consisting of 978 real life images with no captions.
In order to evaluate the quality of the generated captions, we propose a new image captioning metric, object based Semantic Fidelity (SF)
arXiv Detail & Related papers (2020-03-26T04:43:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.