Rule-driven News Captioning
- URL: http://arxiv.org/abs/2403.05101v3
- Date: Thu, 14 Mar 2024 08:00:51 GMT
- Title: Rule-driven News Captioning
- Authors: Ning Xu, Tingting Zhang, Hongshuo Tian, An-An Liu,
- Abstract summary: News captioning task aims to generate sentences by describing named entities or concrete events for an image with its news article.
Existing methods have achieved remarkable results by relying on the large-scale pre-trained models.
We propose the rule-driven news captioning method, which can generate image descriptions following designated rule signal.
- Score: 33.145889362997316
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: News captioning task aims to generate sentences by describing named entities or concrete events for an image with its news article. Existing methods have achieved remarkable results by relying on the large-scale pre-trained models, which primarily focus on the correlations between the input news content and the output predictions. However, the news captioning requires adhering to some fundamental rules of news reporting, such as accurately describing the individuals and actions associated with the event. In this paper, we propose the rule-driven news captioning method, which can generate image descriptions following designated rule signal. Specifically, we first design the news-aware semantic rule for the descriptions. This rule incorporates the primary action depicted in the image (e.g., "performing") and the roles played by named entities involved in the action (e.g., "Agent" and "Place"). Second, we inject this semantic rule into the large-scale pre-trained model, BART, with the prefix-tuning strategy, where multiple encoder layers are embedded with news-aware semantic rule. Finally, we can effectively guide BART to generate news sentences that comply with the designated rule. Extensive experiments on two widely used datasets (i.e., GoodNews and NYTimes800k) demonstrate the effectiveness of our method.
Related papers
- Visually-Aware Context Modeling for News Image Captioning [54.31708859631821]
News Image Captioning aims to create captions from news articles and images.
We propose a face-naming module for learning better name embeddings.
We use CLIP to retrieve sentences that are semantically close to the image.
arXiv Detail & Related papers (2023-08-16T12:39:39Z) - Focus! Relevant and Sufficient Context Selection for News Image
Captioning [69.36678144800936]
News Image Captioning requires describing an image by leveraging additional context from a news article.
We propose to use the pre-trained vision and language retrieval model CLIP to localize the visually grounded entities in the news article.
Our experiments demonstrate that by simply selecting a better context from the article, we can significantly improve the performance of existing models.
arXiv Detail & Related papers (2022-12-01T20:00:27Z) - Journalistic Guidelines Aware News Image Captioning [8.295819830685536]
News article image captioning aims to generate descriptive and informative captions for news article images.
Unlike conventional image captions that simply describe the content of the image in general terms, news image captions rely heavily on named entities to describe the image content.
We propose a new approach to this task, motivated by caption guidelines that journalists follow.
arXiv Detail & Related papers (2021-09-07T04:49:50Z) - ICECAP: Information Concentrated Entity-aware Image Captioning [41.53906032024941]
We propose an entity-aware news image captioning task to generate informative captions.
Our model first creates coarse concentration on relevant sentences using a cross-modality retrieval model.
Experiments on both BreakingNews and GoodNews datasets demonstrate the effectiveness of our proposed method.
arXiv Detail & Related papers (2021-08-04T13:27:51Z) - MOC-GAN: Mixing Objects and Captions to Generate Realistic Images [21.240099965546637]
We introduce a more rational setting, generating a realistic image from the objects and captions.
Under this setting, objects explicitly define the critical roles in the targeted images and captions implicitly describe their rich attributes and connections.
A MOC-GAN is proposed to mix the inputs of two modalities to generate realistic images.
arXiv Detail & Related papers (2021-06-06T14:04:07Z) - LTIatCMU at SemEval-2020 Task 11: Incorporating Multi-Level Features for
Multi-Granular Propaganda Span Identification [70.1903083747775]
This paper describes our submission for the task of Propaganda Span Identification in news articles.
We introduce a BERT-BiLSTM based span-level propaganda classification model that identifies which token spans within the sentence are indicative of propaganda.
arXiv Detail & Related papers (2020-08-11T16:14:47Z) - Leveraging Declarative Knowledge in Text and First-Order Logic for
Fine-Grained Propaganda Detection [139.3415751957195]
We study the detection of propagandistic text fragments in news articles.
We introduce an approach to inject declarative knowledge of fine-grained propaganda techniques.
arXiv Detail & Related papers (2020-04-29T13:46:15Z) - Hierarchical Image Classification using Entailment Cone Embeddings [68.82490011036263]
We first inject label-hierarchy knowledge into an arbitrary CNN-based classifier.
We empirically show that availability of such external semantic information in conjunction with the visual semantics from images boosts overall performance.
arXiv Detail & Related papers (2020-04-02T10:22:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.