Journalistic Guidelines Aware News Image Captioning
- URL: http://arxiv.org/abs/2109.02865v1
- Date: Tue, 7 Sep 2021 04:49:50 GMT
- Title: Journalistic Guidelines Aware News Image Captioning
- Authors: Xuewen Yang, Svebor Karaman, Joel Tetreault, Alex Jaimes
- Abstract summary: News article image captioning aims to generate descriptive and informative captions for news article images.
Unlike conventional image captions that simply describe the content of the image in general terms, news image captions rely heavily on named entities to describe the image content.
We propose a new approach to this task, motivated by caption guidelines that journalists follow.
- Score: 8.295819830685536
- License: http://creativecommons.org/publicdomain/zero/1.0/
- Abstract: The task of news article image captioning aims to generate descriptive and
informative captions for news article images. Unlike conventional image
captions that simply describe the content of the image in general terms, news
image captions follow journalistic guidelines and rely heavily on named
entities to describe the image content, often drawing context from the whole
article they are associated with. In this work, we propose a new approach to
this task, motivated by caption guidelines that journalists follow. Our
approach, Journalistic Guidelines Aware News Image Captioning (JoGANIC),
leverages the structure of captions to improve the generation quality and guide
our representation design. Experimental results, including detailed ablation
studies, on two large-scale publicly available datasets show that JoGANIC
substantially outperforms state-of-the-art methods both on caption generation
and named entity related metrics.
Related papers
- What Makes for Good Image Captions? [50.48589893443939]
Our framework posits that good image captions should balance three key aspects: informationally sufficient, minimally redundant, and readily comprehensible by humans.
We introduce the Pyramid of Captions (PoCa) method, which generates enriched captions by integrating local and global visual information.
arXiv Detail & Related papers (2024-05-01T12:49:57Z) - Image Captioning in news report scenario [12.42658463552019]
We explore the realm of image captioning specifically tailored for celebrity photographs.
This exploration aims to augment automated news content generation, thereby facilitating a more nuanced dissemination of information.
arXiv Detail & Related papers (2024-03-24T16:08:10Z) - Video Summarization: Towards Entity-Aware Captions [73.28063602552741]
We propose the task of summarizing news video directly to entity-aware captions.
We show that our approach generalizes to existing news image captions dataset.
arXiv Detail & Related papers (2023-12-01T23:56:00Z) - Visually-Aware Context Modeling for News Image Captioning [54.31708859631821]
News Image Captioning aims to create captions from news articles and images.
We propose a face-naming module for learning better name embeddings.
We use CLIP to retrieve sentences that are semantically close to the image.
arXiv Detail & Related papers (2023-08-16T12:39:39Z) - CapText: Large Language Model-based Caption Generation From Image
Context and Description [0.0]
We propose and evaluate a new approach to generate captions from textual descriptions and context alone.
Our approach outperforms current state-of-the-art image-text alignment models like OSCAR-VinVL on this task on the CIDEr metric.
arXiv Detail & Related papers (2023-06-01T02:40:44Z) - NewsStories: Illustrating articles with visual summaries [49.924916589209374]
We introduce a large-scale multimodal dataset containing over 31M articles, 22M images and 1M videos.
We show that state-of-the-art image-text alignment methods are not robust to longer narratives with multiple images.
We introduce an intuitive baseline that outperforms these methods on zero-shot image-set retrieval by 10% on the GoodNews dataset.
arXiv Detail & Related papers (2022-07-26T17:34:11Z) - ICECAP: Information Concentrated Entity-aware Image Captioning [41.53906032024941]
We propose an entity-aware news image captioning task to generate informative captions.
Our model first creates coarse concentration on relevant sentences using a cross-modality retrieval model.
Experiments on both BreakingNews and GoodNews datasets demonstrate the effectiveness of our proposed method.
arXiv Detail & Related papers (2021-08-04T13:27:51Z) - Iconographic Image Captioning for Artworks [2.3859169601259342]
This work utilizes a novel large-scale dataset of artwork images annotated with concepts from the Iconclass classification system designed for art and iconography.
The annotations are processed into clean textual description to create a dataset suitable for training a deep neural network model on the image captioning task.
A transformer-based vision-language pre-trained model is fine-tuned using the artwork image dataset.
The quality of the generated captions and the model's capacity to generalize to new data is explored by employing the model on a new collection of paintings and performing an analysis of the relation between commonly generated captions and the artistic genre.
arXiv Detail & Related papers (2021-02-07T23:11:33Z) - Visual News: Benchmark and Challenges in News Image Captioning [18.865262609683676]
We propose Visual News Captioner, an entity-aware model for the task of news image captioning.
We also introduce Visual News, a large-scale benchmark consisting of more than one million news images.
arXiv Detail & Related papers (2020-10-08T03:07:00Z) - Transform and Tell: Entity-Aware News Image Captioning [77.4898875082832]
We propose an end-to-end model which generates captions for images embedded in news articles.
We address the first challenge by associating words in the caption with faces and objects in the image, via a multi-modal, multi-head attention mechanism.
We tackle the second challenge with a state-of-the-art transformer language model that uses byte-pair-encoding to generate captions as a sequence of word parts.
arXiv Detail & Related papers (2020-04-17T05:44:37Z) - Egoshots, an ego-vision life-logging dataset and semantic fidelity
metric to evaluate diversity in image captioning models [63.11766263832545]
We present a new image captioning dataset, Egoshots, consisting of 978 real life images with no captions.
In order to evaluate the quality of the generated captions, we propose a new image captioning metric, object based Semantic Fidelity (SF)
arXiv Detail & Related papers (2020-03-26T04:43:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.