N15News: A New Dataset for Multimodal News Classification
- URL: http://arxiv.org/abs/2108.13327v1
- Date: Mon, 30 Aug 2021 15:46:09 GMT
- Title: N15News: A New Dataset for Multimodal News Classification
- Authors: Zhen Wang, Xu Shan, Jie Yang
- Abstract summary: We propose a new dataset, N15News, which is generated from New York Times with 15 categories and contains both text and image information in each news.
We design a novel multitask multimodal network with different fusion methods, and experiments show multimodal news classification performs better than text-only news classification.
- Score: 7.846107230241092
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Current news datasets merely focus on text features on the news and rarely
leverage the feature of images, excluding numerous essential features for news
classification. In this paper, we propose a new dataset, N15News, which is
generated from New York Times with 15 categories and contains both text and
image information in each news. We design a novel multitask multimodal network
with different fusion methods, and experiments show multimodal news
classification performs better than text-only news classification. Depending on
the length of the text, the classification accuracy can be increased by up to
5.8%. Our research reveals the relationship between the performance of a
multimodal classifier and its sub-classifiers, and also the possible
improvements when applying multimodal in news classification. N15News is shown
to have great potential to prompt the multimodal news studies.
Related papers
- A Multilingual Similarity Dataset for News Article Frame [14.977682986280998]
We introduce an extended version of a large labeled news article dataset with 16,687 new labeled pairs.
Our method frees the work of manual identification of frame classes in traditional news frame analysis studies.
Overall we introduce the most extensive cross-lingual news article similarity dataset available to date with 26,555 labeled news article pairs across 10 languages.
arXiv Detail & Related papers (2024-05-22T01:01:04Z) - FineFake: A Knowledge-Enriched Dataset for Fine-Grained Multi-Domain Fake News Detection [54.37159298632628]
FineFake is a multi-domain knowledge-enhanced benchmark for fake news detection.
FineFake encompasses 16,909 data samples spanning six semantic topics and eight platforms.
The entire FineFake project is publicly accessible as an open-source repository.
arXiv Detail & Related papers (2024-03-30T14:39:09Z) - SPICED: News Similarity Detection Dataset with Multiple Topics and Complexity Levels [13.117993238869659]
We propose a novel dataset of similar news, SPICED, which includes seven topics.
We present four different levels of complexity, specifically designed for news similarity detection task.
arXiv Detail & Related papers (2023-09-21T10:55:26Z) - Designing and Evaluating Interfaces that Highlight News Coverage
Diversity Using Discord Questions [84.55145223950427]
This paper shows that navigating large source collections for a news story can be challenging without further guidance.
We design three interfaces -- the Annotated Article, the Recomposed Article, and the Question Grid -- aimed at accompanying news readers in discovering coverage diversity while they read.
arXiv Detail & Related papers (2023-02-17T16:59:31Z) - Multiverse: Multilingual Evidence for Fake News Detection [71.51905606492376]
Multiverse is a new feature based on multilingual evidence that can be used for fake news detection.
The hypothesis of the usage of cross-lingual evidence as a feature for fake news detection is confirmed.
arXiv Detail & Related papers (2022-11-25T18:24:17Z) - Multimodal Fake News Detection with Adaptive Unimodal Representation
Aggregation [28.564442206829625]
AURA is a multimodal fake news detection network with adaptive unimodal representation aggregation.
We perform coarse-level fake news detection and cross-modal cosistency learning according to the unimodal and multimodal representations.
Experiments on Weibo and Gossipcop prove that AURA can successfully beat several state-of-the-art FND schemes.
arXiv Detail & Related papers (2022-06-12T14:06:55Z) - Supervised Contrastive Learning for Multimodal Unreliable News Detection
in COVID-19 Pandemic [16.43888233012092]
We propose a BERT-based multimodal unreliable news detection framework.
It captures both textual and visual information from unreliable articles.
We show that our model outperforms a number of competitive baselines in unreliable news detection.
arXiv Detail & Related papers (2021-09-04T11:53:37Z) - Cross-Media Keyphrase Prediction: A Unified Framework with
Multi-Modality Multi-Head Attention and Image Wordings [63.79979145520512]
We explore the joint effects of texts and images in predicting the keyphrases for a multimedia post.
We propose a novel Multi-Modality Multi-Head Attention (M3H-Att) to capture the intricate cross-media interactions.
Our model significantly outperforms the previous state of the art based on traditional attention networks.
arXiv Detail & Related papers (2020-11-03T08:44:18Z) - LTIatCMU at SemEval-2020 Task 11: Incorporating Multi-Level Features for
Multi-Granular Propaganda Span Identification [70.1903083747775]
This paper describes our submission for the task of Propaganda Span Identification in news articles.
We introduce a BERT-BiLSTM based span-level propaganda classification model that identifies which token spans within the sentence are indicative of propaganda.
arXiv Detail & Related papers (2020-08-11T16:14:47Z) - Cross-media Structured Common Space for Multimedia Event Extraction [82.36301617438268]
We introduce a new task, MultiMedia Event Extraction (M2E2), which aims to extract events and their arguments from multimedia documents.
We propose a novel method, Weakly Aligned Structured Embedding (WASE), that encodes structured representations of semantic information into a common embedding space.
By utilizing images, we extract 21.4% more event mentions than traditional text-only methods.
arXiv Detail & Related papers (2020-05-05T20:21:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.