ICECAP: Information Concentrated Entity-aware Image Captioning
- URL: http://arxiv.org/abs/2108.02050v1
- Date: Wed, 4 Aug 2021 13:27:51 GMT
- Title: ICECAP: Information Concentrated Entity-aware Image Captioning
- Authors: Anwen Hu, Shizhe Chen, Qin Jin
- Abstract summary: We propose an entity-aware news image captioning task to generate informative captions.
Our model first creates coarse concentration on relevant sentences using a cross-modality retrieval model.
Experiments on both BreakingNews and GoodNews datasets demonstrate the effectiveness of our proposed method.
- Score: 41.53906032024941
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Most current image captioning systems focus on describing general image
content, and lack background knowledge to deeply understand the image, such as
exact named entities or concrete events. In this work, we focus on the
entity-aware news image captioning task which aims to generate informative
captions by leveraging the associated news articles to provide background
knowledge about the target image. However, due to the length of news articles,
previous works only employ news articles at the coarse article or sentence
level, which are not fine-grained enough to refine relevant events and choose
named entities accurately. To overcome these limitations, we propose an
Information Concentrated Entity-aware news image CAPtioning (ICECAP) model,
which progressively concentrates on relevant textual information within the
corresponding news article from the sentence level to the word level. Our model
first creates coarse concentration on relevant sentences using a cross-modality
retrieval model and then generates captions by further concentrating on
relevant words within the sentences. Extensive experiments on both BreakingNews
and GoodNews datasets demonstrate the effectiveness of our proposed method,
which outperforms other state-of-the-arts. The code of ICECAP is publicly
available at https://github.com/HAWLYQ/ICECAP.
Related papers
- What Makes for Good Image Captions? [50.48589893443939]
Our framework posits that good image captions should balance three key aspects: informationally sufficient, minimally redundant, and readily comprehensible by humans.
We introduce the Pyramid of Captions (PoCa) method, which generates enriched captions by integrating local and global visual information.
arXiv Detail & Related papers (2024-05-01T12:49:57Z) - Image Captioning in news report scenario [12.42658463552019]
We explore the realm of image captioning specifically tailored for celebrity photographs.
This exploration aims to augment automated news content generation, thereby facilitating a more nuanced dissemination of information.
arXiv Detail & Related papers (2024-03-24T16:08:10Z) - Rule-driven News Captioning [33.145889362997316]
News captioning task aims to generate sentences by describing named entities or concrete events for an image with its news article.
Existing methods have achieved remarkable results by relying on the large-scale pre-trained models.
We propose the rule-driven news captioning method, which can generate image descriptions following designated rule signal.
arXiv Detail & Related papers (2024-03-08T07:06:43Z) - Visually-Aware Context Modeling for News Image Captioning [54.31708859631821]
News Image Captioning aims to create captions from news articles and images.
We propose a face-naming module for learning better name embeddings.
We use CLIP to retrieve sentences that are semantically close to the image.
arXiv Detail & Related papers (2023-08-16T12:39:39Z) - InfoMetIC: An Informative Metric for Reference-free Image Caption
Evaluation [69.1642316502563]
We propose an Informative Metric for Reference-free Image Caption evaluation (InfoMetIC)
Given an image and a caption, InfoMetIC is able to report incorrect words and unmentioned image regions at fine-grained level.
We also construct a token-level evaluation dataset and demonstrate the effectiveness of InfoMetIC in fine-grained evaluation.
arXiv Detail & Related papers (2023-05-10T09:22:44Z) - Focus! Relevant and Sufficient Context Selection for News Image
Captioning [69.36678144800936]
News Image Captioning requires describing an image by leveraging additional context from a news article.
We propose to use the pre-trained vision and language retrieval model CLIP to localize the visually grounded entities in the news article.
Our experiments demonstrate that by simply selecting a better context from the article, we can significantly improve the performance of existing models.
arXiv Detail & Related papers (2022-12-01T20:00:27Z) - Journalistic Guidelines Aware News Image Captioning [8.295819830685536]
News article image captioning aims to generate descriptive and informative captions for news article images.
Unlike conventional image captions that simply describe the content of the image in general terms, news image captions rely heavily on named entities to describe the image content.
We propose a new approach to this task, motivated by caption guidelines that journalists follow.
arXiv Detail & Related papers (2021-09-07T04:49:50Z) - Transform and Tell: Entity-Aware News Image Captioning [77.4898875082832]
We propose an end-to-end model which generates captions for images embedded in news articles.
We address the first challenge by associating words in the caption with faces and objects in the image, via a multi-modal, multi-head attention mechanism.
We tackle the second challenge with a state-of-the-art transformer language model that uses byte-pair-encoding to generate captions as a sequence of word parts.
arXiv Detail & Related papers (2020-04-17T05:44:37Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.