Image Captioning in news report scenario
- URL: http://arxiv.org/abs/2403.16209v3
- Date: Tue, 2 Apr 2024 01:57:00 GMT
- Title: Image Captioning in news report scenario
- Authors: Tianrui Liu, Qi Cai, Changxin Xu, Bo Hong, Jize Xiong, Yuxin Qiao, Tsungwei Yang,
- Abstract summary: We explore the realm of image captioning specifically tailored for celebrity photographs.
This exploration aims to augment automated news content generation, thereby facilitating a more nuanced dissemination of information.
- Score: 12.42658463552019
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Image captioning strives to generate pertinent captions for specified images, situating itself at the crossroads of Computer Vision (CV) and Natural Language Processing (NLP). This endeavor is of paramount importance with far-reaching applications in recommendation systems, news outlets, social media, and beyond. Particularly within the realm of news reporting, captions are expected to encompass detailed information, such as the identities of celebrities captured in the images. However, much of the existing body of work primarily centers around understanding scenes and actions. In this paper, we explore the realm of image captioning specifically tailored for celebrity photographs, illustrating its broad potential for enhancing news industry practices. This exploration aims to augment automated news content generation, thereby facilitating a more nuanced dissemination of information. Our endeavor shows a broader horizon, enriching the narrative in news reporting through a more intuitive image captioning framework.
Related papers
- Benchmarking Large Vision-Language Models via Directed Scene Graph for Comprehensive Image Captioning [77.2852342808769]
In this paper, we introduce a detailed caption benchmark, termed as CompreCap, to evaluate the visual context from a directed scene graph view.
We first manually segment the image into semantically meaningful regions according to common-object vocabulary, while also distinguishing attributes of objects within all those regions.
Then directional relation labels of these objects are annotated to compose a directed scene graph that can well encode rich compositional information of the image.
arXiv Detail & Related papers (2024-12-11T18:37:42Z) - What Makes for Good Image Captions? [50.48589893443939]
Our framework posits that good image captions should balance three key aspects: informationally sufficient, minimally redundant, and readily comprehensible by humans.
We introduce the Pyramid of Captions (PoCa) method, which generates enriched captions by integrating local and global visual information.
arXiv Detail & Related papers (2024-05-01T12:49:57Z) - Video Summarization: Towards Entity-Aware Captions [73.28063602552741]
We propose the task of summarizing news video directly to entity-aware captions.
We show that our approach generalizes to existing news image captions dataset.
arXiv Detail & Related papers (2023-12-01T23:56:00Z) - Visually-Aware Context Modeling for News Image Captioning [54.31708859631821]
News Image Captioning aims to create captions from news articles and images.
We propose a face-naming module for learning better name embeddings.
We use CLIP to retrieve sentences that are semantically close to the image.
arXiv Detail & Related papers (2023-08-16T12:39:39Z) - CapOnImage: Context-driven Dense-Captioning on Image [13.604173177437536]
We introduce a new task called captioning on image (CapOnImage), which aims to generate dense captions at different locations of the image based on contextual information.
We propose a multi-modal pre-training model with multi-level pre-training tasks that progressively learn the correspondence between texts and image locations.
Compared with other image captioning model variants, our model achieves the best results in both captioning accuracy and diversity aspects.
arXiv Detail & Related papers (2022-04-27T14:40:31Z) - Journalistic Guidelines Aware News Image Captioning [8.295819830685536]
News article image captioning aims to generate descriptive and informative captions for news article images.
Unlike conventional image captions that simply describe the content of the image in general terms, news image captions rely heavily on named entities to describe the image content.
We propose a new approach to this task, motivated by caption guidelines that journalists follow.
arXiv Detail & Related papers (2021-09-07T04:49:50Z) - ICECAP: Information Concentrated Entity-aware Image Captioning [41.53906032024941]
We propose an entity-aware news image captioning task to generate informative captions.
Our model first creates coarse concentration on relevant sentences using a cross-modality retrieval model.
Experiments on both BreakingNews and GoodNews datasets demonstrate the effectiveness of our proposed method.
arXiv Detail & Related papers (2021-08-04T13:27:51Z) - Visual News: Benchmark and Challenges in News Image Captioning [18.865262609683676]
We propose Visual News Captioner, an entity-aware model for the task of news image captioning.
We also introduce Visual News, a large-scale benchmark consisting of more than one million news images.
arXiv Detail & Related papers (2020-10-08T03:07:00Z) - Transform and Tell: Entity-Aware News Image Captioning [77.4898875082832]
We propose an end-to-end model which generates captions for images embedded in news articles.
We address the first challenge by associating words in the caption with faces and objects in the image, via a multi-modal, multi-head attention mechanism.
We tackle the second challenge with a state-of-the-art transformer language model that uses byte-pair-encoding to generate captions as a sequence of word parts.
arXiv Detail & Related papers (2020-04-17T05:44:37Z) - Egoshots, an ego-vision life-logging dataset and semantic fidelity
metric to evaluate diversity in image captioning models [63.11766263832545]
We present a new image captioning dataset, Egoshots, consisting of 978 real life images with no captions.
In order to evaluate the quality of the generated captions, we propose a new image captioning metric, object based Semantic Fidelity (SF)
arXiv Detail & Related papers (2020-03-26T04:43:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.