SPICED: News Similarity Detection Dataset with Multiple Topics and Complexity Levels
- URL: http://arxiv.org/abs/2309.13080v3
- Date: Fri, 23 Aug 2024 08:58:22 GMT
- Title: SPICED: News Similarity Detection Dataset with Multiple Topics and Complexity Levels
- Authors: Elena Shushkevich, Long Mai, Manuel V. Loureiro, Steven Derby, Tri Kurniawan Wijaya,
- Abstract summary: We propose a novel dataset of similar news, SPICED, which includes seven topics.
We present four different levels of complexity, specifically designed for news similarity detection task.
- Score: 13.117993238869659
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: The proliferation of news media outlets has increased the demand for intelligent systems capable of detecting redundant information in news articles in order to enhance user experience. However, the heterogeneous nature of news can lead to spurious findings in these systems: Simple heuristics such as whether a pair of news are both about politics can provide strong but deceptive downstream performance. Segmenting news similarity datasets into topics improves the training of these models by forcing them to learn how to distinguish salient characteristics under more narrow domains. However, this requires the existence of topic-specific datasets, which are currently lacking. In this article, we propose a novel dataset of similar news, SPICED, which includes seven topics: Crime & Law, Culture & Entertainment, Disasters & Accidents, Economy & Business, Politics & Conflicts, Science & Technology, and Sports. Futhermore, we present four different levels of complexity, specifically designed for news similarity detection task. We benchmarked the created datasets using MinHash, BERT, SBERT, and SimCSE models.
Related papers
- A Multilingual Similarity Dataset for News Article Frame [14.977682986280998]
We introduce an extended version of a large labeled news article dataset with 16,687 new labeled pairs.
Our method frees the work of manual identification of frame classes in traditional news frame analysis studies.
Overall we introduce the most extensive cross-lingual news article similarity dataset available to date with 26,555 labeled news article pairs across 10 languages.
arXiv Detail & Related papers (2024-05-22T01:01:04Z) - From Nuisance to News Sense: Augmenting the News with Cross-Document
Evidence and Context [25.870137795858522]
We present NEWSSENSE, a novel sensemaking tool and reading interface designed to collect and integrate information from multiple news articles on a central topic.
NEWSSENSE augments a central, grounding article of the user's choice by linking it to related articles from different sources.
Our pilot study shows that NEWSSENSE has the potential to help users identify key information, verify the credibility of news articles, and explore different perspectives.
arXiv Detail & Related papers (2023-10-06T21:15:11Z) - Prompt-and-Align: Prompt-Based Social Alignment for Few-Shot Fake News
Detection [50.07850264495737]
"Prompt-and-Align" (P&A) is a novel prompt-based paradigm for few-shot fake news detection.
We show that P&A sets new states-of-the-art for few-shot fake news detection performance by significant margins.
arXiv Detail & Related papers (2023-09-28T13:19:43Z) - Towards Corpus-Scale Discovery of Selection Biases in News Coverage:
Comparing What Sources Say About Entities as a Start [65.28355014154549]
This paper investigates the challenges of building scalable NLP systems for discovering patterns of media selection biases directly from news content in massive-scale news corpora.
We show the capabilities of the framework through a case study on NELA-2020, a corpus of 1.8M news articles in English from 519 news sources worldwide.
arXiv Detail & Related papers (2023-04-06T23:36:45Z) - No Place to Hide: Dual Deep Interaction Channel Network for Fake News
Detection based on Data Augmentation [16.40196904371682]
We propose a novel framework for fake news detection from perspectives of semantic, emotion and data enhancement.
A dual deep interaction channel network of semantic and emotion is designed to obtain a more comprehensive and fine-grained news representation.
Experiments show that the proposed approach outperforms the state-of-the-art methods.
arXiv Detail & Related papers (2023-03-31T13:33:53Z) - Nothing Stands Alone: Relational Fake News Detection with Hypergraph
Neural Networks [49.29141811578359]
We propose to leverage a hypergraph to represent group-wise interaction among news, while focusing on important news relations with its dual-level attention mechanism.
Our approach yields remarkable performance and maintains the high performance even with a small subset of labeled news data.
arXiv Detail & Related papers (2022-12-24T00:19:32Z) - Multiverse: Multilingual Evidence for Fake News Detection [71.51905606492376]
Multiverse is a new feature based on multilingual evidence that can be used for fake news detection.
The hypothesis of the usage of cross-lingual evidence as a feature for fake news detection is confirmed.
arXiv Detail & Related papers (2022-11-25T18:24:17Z) - Fake News Quick Detection on Dynamic Heterogeneous Information Networks [3.599616699656401]
We propose a novel Dynamic Heterogeneous Graph Neural Network (DHGNN) for fake news quick detection.
We first implement BERT and fine-tuned BERT to get a semantic representation of the news article contents and author profiles.
Then, we construct the heterogeneous news-author graph to reflect contextual information and relationships.
arXiv Detail & Related papers (2022-05-14T11:23:25Z) - Adversarial Active Learning based Heterogeneous Graph Neural Network for
Fake News Detection [18.847254074201953]
We propose a novel fake news detection framework, namely Adversarial Active Learning-based Heterogeneous Graph Neural Network (AA-HGNN)
AA-HGNN utilizes an active learning framework to enhance learning performance, especially when facing the paucity of labeled data.
Experiments with two real-world fake news datasets show that our model can outperform text-based models and other graph-based models.
arXiv Detail & Related papers (2021-01-27T05:05:25Z) - Machine Learning Explanations to Prevent Overtrust in Fake News
Detection [64.46876057393703]
This research investigates the effects of an Explainable AI assistant embedded in news review platforms for combating the propagation of fake news.
We design a news reviewing and sharing interface, create a dataset of news stories, and train four interpretable fake news detection algorithms.
For a deeper understanding of Explainable AI systems, we discuss interactions between user engagement, mental model, trust, and performance measures in the process of explaining.
arXiv Detail & Related papers (2020-07-24T05:42:29Z) - A Deep Learning Approach for Automatic Detection of Fake News [47.00462375817434]
We propose two models based on deep learning for solving fake news detection problem in online news contents of multiple domains.
We evaluate our techniques on the two recently released datasets, namely FakeNews AMT and Celebrity for fake news detection.
arXiv Detail & Related papers (2020-05-11T09:07:46Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.