Related papers: SPICED: News Similarity Detection Dataset with Multiple Topics and Complexity Levels

SPICED: News Similarity Detection Dataset with Multiple Topics and Complexity Levels

URL: http://arxiv.org/abs/2309.13080v1
Date: Thu, 21 Sep 2023 10:55:26 GMT
Title: SPICED: News Similarity Detection Dataset with Multiple Topics and Complexity Levels
Authors: Elena Shushkevich, Long Mai, Manuel V. Loureiro, Steven Derby, Tri Kurniawan Wijaya
Abstract summary: We propose a new dataset of similar news, SPICED, which includes seven topics: Crime & Law, Culture & Entertainment, Disasters & Accidents, Economy & Business, Politics & Conflicts, Science & Technology, and Sports. We present four distinct approaches for generating news pairs, which are used in the creation of datasets specifically designed for news similarity detection task.
Score: 14.073585972409756
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: Nowadays, the use of intelligent systems to detect redundant information in news articles has become especially prevalent with the proliferation of news media outlets in order to enhance user experience. However, the heterogeneous nature of news can lead to spurious findings in these systems: Simple heuristics such as whether a pair of news are both about politics can provide strong but deceptive downstream performance. Segmenting news similarity datasets into topics improves the training of these models by forcing them to learn how to distinguish salient characteristics under more narrow domains. However, this requires the existence of topic-specific datasets, which are currently lacking. In this article, we propose a new dataset of similar news, SPICED, which includes seven topics: Crime & Law, Culture & Entertainment, Disasters & Accidents, Economy & Business, Politics & Conflicts, Science & Technology, and Sports. Futhermore, we present four distinct approaches for generating news pairs, which are used in the creation of datasets specifically designed for news similarity detection task. We benchmarked the created datasets using MinHash, BERT, SBERT, and SimCSE models.

Related papers

A Python Tool for Reconstructing Full News Text from GDELT [0.0]
This paper presents a novel approach to obtaining full-text newspaper articles at near-zero cost. We focus on the GDELT Web News NGrams 3.0 dataset, which provides high-frequency updates of n-grams extracted from global online news sources. We provide Python code to reconstruct full-text articles from these n-grams by identifying overlapping textual fragments and intelligently merging them.
arXiv Detail & Related papers (2025-04-22T17:40:42Z)
Dynamic Analysis and Adaptive Discriminator for Fake News Detection [59.41431561403343]
We propose a Dynamic Analysis and Adaptive Discriminator (DAAD) approach for fake news detection. For knowledge-based methods, we introduce the Monte Carlo Tree Search algorithm to leverage the self-reflective capabilities of large language models. For semantic-based methods, we define four typical deceit patterns to reveal the mechanisms behind fake news creation.
arXiv Detail & Related papers (2024-08-20T14:13:54Z)
A Multilingual Similarity Dataset for News Article Frame [14.977682986280998]
We introduce an extended version of a large labeled news article dataset with 16,687 new labeled pairs. Our method frees the work of manual identification of frame classes in traditional news frame analysis studies. Overall we introduce the most extensive cross-lingual news article similarity dataset available to date with 26,555 labeled news article pairs across 10 languages.
arXiv Detail & Related papers (2024-05-22T01:01:04Z)
From Nuisance to News Sense: Augmenting the News with Cross-Document Evidence and Context [25.870137795858522]
We present NEWSSENSE, a novel sensemaking tool and reading interface designed to collect and integrate information from multiple news articles on a central topic. NEWSSENSE augments a central, grounding article of the user's choice by linking it to related articles from different sources. Our pilot study shows that NEWSSENSE has the potential to help users identify key information, verify the credibility of news articles, and explore different perspectives.
arXiv Detail & Related papers (2023-10-06T21:15:11Z)
Prompt-and-Align: Prompt-Based Social Alignment for Few-Shot Fake News Detection [50.07850264495737]
"Prompt-and-Align" (P&A) is a novel prompt-based paradigm for few-shot fake news detection. We show that P&A sets new states-of-the-art for few-shot fake news detection performance by significant margins.
arXiv Detail & Related papers (2023-09-28T13:19:43Z)
Towards Corpus-Scale Discovery of Selection Biases in News Coverage: Comparing What Sources Say About Entities as a Start [65.28355014154549]
This paper investigates the challenges of building scalable NLP systems for discovering patterns of media selection biases directly from news content in massive-scale news corpora. We show the capabilities of the framework through a case study on NELA-2020, a corpus of 1.8M news articles in English from 519 news sources worldwide.
arXiv Detail & Related papers (2023-04-06T23:36:45Z)
No Place to Hide: Dual Deep Interaction Channel Network for Fake News Detection based on Data Augmentation [16.40196904371682]
We propose a novel framework for fake news detection from perspectives of semantic, emotion and data enhancement. A dual deep interaction channel network of semantic and emotion is designed to obtain a more comprehensive and fine-grained news representation. Experiments show that the proposed approach outperforms the state-of-the-art methods.
arXiv Detail & Related papers (2023-03-31T13:33:53Z)
Nothing Stands Alone: Relational Fake News Detection with Hypergraph Neural Networks [49.29141811578359]
We propose to leverage a hypergraph to represent group-wise interaction among news, while focusing on important news relations with its dual-level attention mechanism. Our approach yields remarkable performance and maintains the high performance even with a small subset of labeled news data.
arXiv Detail & Related papers (2022-12-24T00:19:32Z)
Multiverse: Multilingual Evidence for Fake News Detection [71.51905606492376]
Multiverse is a new feature based on multilingual evidence that can be used for fake news detection. The hypothesis of the usage of cross-lingual evidence as a feature for fake news detection is confirmed.
arXiv Detail & Related papers (2022-11-25T18:24:17Z)
Fake News Quick Detection on Dynamic Heterogeneous Information Networks [3.599616699656401]
We propose a novel Dynamic Heterogeneous Graph Neural Network (DHGNN) for fake news quick detection. We first implement BERT and fine-tuned BERT to get a semantic representation of the news article contents and author profiles. Then, we construct the heterogeneous news-author graph to reflect contextual information and relationships.
arXiv Detail & Related papers (2022-05-14T11:23:25Z)
Adversarial Active Learning based Heterogeneous Graph Neural Network for Fake News Detection [18.847254074201953]
We propose a novel fake news detection framework, namely Adversarial Active Learning-based Heterogeneous Graph Neural Network (AA-HGNN) AA-HGNN utilizes an active learning framework to enhance learning performance, especially when facing the paucity of labeled data. Experiments with two real-world fake news datasets show that our model can outperform text-based models and other graph-based models.
arXiv Detail & Related papers (2021-01-27T05:05:25Z)
Machine Learning Explanations to Prevent Overtrust in Fake News Detection [64.46876057393703]
This research investigates the effects of an Explainable AI assistant embedded in news review platforms for combating the propagation of fake news. We design a news reviewing and sharing interface, create a dataset of news stories, and train four interpretable fake news detection algorithms. For a deeper understanding of Explainable AI systems, we discuss interactions between user engagement, mental model, trust, and performance measures in the process of explaining.
arXiv Detail & Related papers (2020-07-24T05:42:29Z)
A Deep Learning Approach for Automatic Detection of Fake News [47.00462375817434]
We propose two models based on deep learning for solving fake news detection problem in online news contents of multiple domains. We evaluate our techniques on the two recently released datasets, namely FakeNews AMT and Celebrity for fake news detection.
arXiv Detail & Related papers (2020-05-11T09:07:46Z)
BaitWatcher: A lightweight web interface for the detection of incongruent news headlines [27.29585619643952]
BaitWatcher is a lightweight web interface that guides readers in estimating the likelihood of incongruence in news articles before clicking on the headlines. BaiittWatcher utilizes a hierarchical recurrent encoder that efficiently learns complex textual representations of a news headline and its associated body text.
arXiv Detail & Related papers (2020-03-23T23:43:02Z)

This list is automatically generated from the titles and abstracts of the papers in this site.