Upgrading the Newsroom: An Automated Image Selection System for News
Articles
- URL: http://arxiv.org/abs/2004.11449v1
- Date: Thu, 23 Apr 2020 20:29:26 GMT
- Title: Upgrading the Newsroom: An Automated Image Selection System for News
Articles
- Authors: Fangyu Liu, R\'emi Lebret, Didier Orel, Philippe Sordet, Karl Aberer
- Abstract summary: We propose an automated image selection system to assist photo editors in selecting suitable images for news articles.
The system fuses multiple textual sources extracted from news articles and accepts multilingual inputs.
We extensively experiment with our system on a large-scale text-image database containing multimodal multilingual news articles.
- Score: 6.901494425127736
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We propose an automated image selection system to assist photo editors in
selecting suitable images for news articles. The system fuses multiple textual
sources extracted from news articles and accepts multilingual inputs. It is
equipped with char-level word embeddings to help both modeling morphologically
rich languages, e.g. German, and transferring knowledge across nearby
languages. The text encoder adopts a hierarchical self-attention mechanism to
attend more to both keywords within a piece of text and informative components
of a news article. We extensively experiment with our system on a large-scale
text-image database containing multimodal multilingual news articles collected
from Swiss local news media websites. The system is compared with multiple
baselines with ablation studies and is shown to beat existing text-image
retrieval methods in a weakly-supervised learning setting. Besides, we also
offer insights on the advantage of using multiple textual sources and
multilingual data.
Related papers
- MMCFND: Multimodal Multilingual Caption-aware Fake News Detection for Low-resource Indic Languages [0.4062349563818079]
We introduce the Multimodal Multilingual dataset for Indic Fake News Detection (MMIFND)
This meticulously curated dataset consists of 28,085 instances distributed across Hindi, Bengali, Marathi, Malayalam, Tamil, Gujarati and Punjabi.
We propose the Multimodal Caption-aware framework for Fake News Detection (MMCFND)
arXiv Detail & Related papers (2024-10-14T11:59:33Z) - OmniCorpus: A Unified Multimodal Corpus of 10 Billion-Level Images Interleaved with Text [112.60163342249682]
We introduce OmniCorpus, a 10 billion-scale image-text interleaved dataset.
Our dataset has 15 times larger scales while maintaining good data quality.
We hope this could provide a solid data foundation for future multimodal model research.
arXiv Detail & Related papers (2024-06-12T17:01:04Z) - XL-HeadTags: Leveraging Multimodal Retrieval Augmentation for the Multilingual Generation of News Headlines and Tags [19.09498276014971]
Headlines and entity (topic) tags are essential for guiding readers to decide if the content is worth their time.
We propose to leverage auxiliary information such as images and captions embedded in the articles to retrieve relevant sentences.
We have compiled a dataset named XL-HeadTags, which includes 20 languages across 6 diverse language families.
arXiv Detail & Related papers (2024-06-06T06:40:19Z) - Universal Multimodal Representation for Language Understanding [110.98786673598015]
This work presents new methods to employ visual information as assistant signals to general NLP tasks.
For each sentence, we first retrieve a flexible number of images either from a light topic-image lookup table extracted over the existing sentence-image pairs.
Then, the text and images are encoded by a Transformer encoder and convolutional neural network, respectively.
arXiv Detail & Related papers (2023-01-09T13:54:11Z) - Multiverse: Multilingual Evidence for Fake News Detection [71.51905606492376]
Multiverse is a new feature based on multilingual evidence that can be used for fake news detection.
The hypothesis of the usage of cross-lingual evidence as a feature for fake news detection is confirmed.
arXiv Detail & Related papers (2022-11-25T18:24:17Z) - NewsStories: Illustrating articles with visual summaries [49.924916589209374]
We introduce a large-scale multimodal dataset containing over 31M articles, 22M images and 1M videos.
We show that state-of-the-art image-text alignment methods are not robust to longer narratives with multiple images.
We introduce an intuitive baseline that outperforms these methods on zero-shot image-set retrieval by 10% on the GoodNews dataset.
arXiv Detail & Related papers (2022-07-26T17:34:11Z) - Open-Vocabulary Multi-Label Classification via Multi-modal Knowledge
Transfer [55.885555581039895]
Multi-label zero-shot learning (ML-ZSL) focuses on transferring knowledge by a pre-trained textual label embedding.
We propose a novel open-vocabulary framework, named multimodal knowledge transfer (MKT) for multi-label classification.
arXiv Detail & Related papers (2022-07-05T08:32:18Z) - MultiSubs: A Large-scale Multimodal and Multilingual Dataset [32.48454703822847]
This paper introduces a large-scale multimodal and multilingual dataset that aims to facilitate research on grounding words to images in their contextual usage in language.
The dataset consists of images selected to unambiguously illustrate concepts expressed in sentences from movie subtitles.
We show the utility of the dataset on two automatic tasks: (i) fill-in-the blank; (ii) lexical translation.
arXiv Detail & Related papers (2021-03-02T18:09:07Z) - Multi-Modal Reasoning Graph for Scene-Text Based Fine-Grained Image
Classification and Retrieval [8.317191999275536]
This paper focuses on leveraging multi-modal content in the form of visual and textual cues to tackle the task of fine-grained image classification and retrieval.
We employ a Graph Convolutional Network to perform multi-modal reasoning and obtain relationship-enhanced features by learning a common semantic space between salient objects and text found in an image.
arXiv Detail & Related papers (2020-09-21T12:31:42Z) - Batch Clustering for Multilingual News Streaming [0.0]
Large volume of diverse and unorganized information makes reading difficult or almost impossible.
We process articles per batch, looking for monolingual local topics which are then linked across time and languages.
Our system gives monolingual state-of-the-art results on dataset of Spanish and German news and crosslingual state-of-the-art results on English, Spanish and German news.
arXiv Detail & Related papers (2020-04-17T08:59:13Z) - Transform and Tell: Entity-Aware News Image Captioning [77.4898875082832]
We propose an end-to-end model which generates captions for images embedded in news articles.
We address the first challenge by associating words in the caption with faces and objects in the image, via a multi-modal, multi-head attention mechanism.
We tackle the second challenge with a state-of-the-art transformer language model that uses byte-pair-encoding to generate captions as a sequence of word parts.
arXiv Detail & Related papers (2020-04-17T05:44:37Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.