Probing Spurious Correlations in Popular Event-Based Rumor Detection
Benchmarks
- URL: http://arxiv.org/abs/2209.08799v1
- Date: Mon, 19 Sep 2022 07:11:36 GMT
- Title: Probing Spurious Correlations in Popular Event-Based Rumor Detection
Benchmarks
- Authors: Jiaying Wu, Bryan Hooi
- Abstract summary: Open-source benchmark datasets suffer from spurious correlations, which are ignored by existing studies.
We propose event-separated rumor detection as a solution to eliminate spurious cues.
Our method outperforms existing baselines in terms of effectiveness, efficiency and generalizability.
- Score: 28.550143417847373
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: As social media becomes a hotbed for the spread of misinformation, the
crucial task of rumor detection has witnessed promising advances fostered by
open-source benchmark datasets. Despite being widely used, we find that these
datasets suffer from spurious correlations, which are ignored by existing
studies and lead to severe overestimation of existing rumor detection
performance. The spurious correlations stem from three causes: (1) event-based
data collection and labeling schemes assign the same veracity label to multiple
highly similar posts from the same underlying event; (2) merging multiple data
sources spuriously relates source identities to veracity labels; and (3)
labeling bias. In this paper, we closely investigate three of the most popular
rumor detection benchmark datasets (i.e., Twitter15, Twitter16 and PHEME), and
propose event-separated rumor detection as a solution to eliminate spurious
cues. Under the event-separated setting, we observe that the accuracy of
existing state-of-the-art models drops significantly by over 40%, becoming only
comparable to a simple neural classifier. To better address this task, we
propose Publisher Style Aggregation (PSA), a generalizable approach that
aggregates publisher posting records to learn writing style and veracity
stance. Extensive experiments demonstrate that our method outperforms existing
baselines in terms of effectiveness, efficiency and generalizability.
Related papers
- Fact Checking Beyond Training Set [64.88575826304024]
We show that the retriever-reader suffers from performance deterioration when it is trained on labeled data from one domain and used in another domain.
We propose an adversarial algorithm to make the retriever component robust against distribution shift.
We then construct eight fact checking scenarios from these datasets, and compare our model to a set of strong baseline models.
arXiv Detail & Related papers (2024-03-27T15:15:14Z) - Rumor Detection with a novel graph neural network approach [12.42658463552019]
We propose a new detection model, that jointly learns the representations of user correlation and information propagation to detect rumors on social media.
Specifically, we leverage graph neural networks to learn the representations of user correlation from a bipartite graph.
We show that it requires a high cost for attackers to subvert user correlation pattern, demonstrating the importance of considering user correlation for rumor detection.
arXiv Detail & Related papers (2024-03-24T15:59:47Z) - ReSup: Reliable Label Noise Suppression for Facial Expression
Recognition [20.74719263734951]
We propose a more reliable noise-label suppression method called ReSup.
To achieve optimal distribution modeling, ReSup models the similarity distribution of all samples.
To further enhance the reliability of our noise decision results, ReSup uses two networks to jointly achieve noise suppression.
arXiv Detail & Related papers (2023-05-29T06:02:06Z) - A Unified Contrastive Transfer Framework with Propagation Structure for
Boosting Low-Resource Rumor Detection [11.201348902221257]
existing rumor detection algorithms show promising performance on yesterday's news.
Due to a lack of substantial training data and prior expert knowledge, they are poor at spotting rumors concerning unforeseen events.
We propose a unified contrastive transfer framework to detect rumors by adapting the features learned from well-resourced rumor data to that of the low-resourced with only few-shot annotations.
arXiv Detail & Related papers (2023-04-04T03:13:03Z) - Rumor Detection with Self-supervised Learning on Texts and Social Graph [101.94546286960642]
We propose contrastive self-supervised learning on heterogeneous information sources, so as to reveal their relations and characterize rumors better.
We term this framework as Self-supervised Rumor Detection (SRD)
Extensive experiments on three real-world datasets validate the effectiveness of SRD for automatic rumor detection on social media.
arXiv Detail & Related papers (2022-04-19T12:10:03Z) - Attentive Prototypes for Source-free Unsupervised Domain Adaptive 3D
Object Detection [85.11649974840758]
3D object detection networks tend to be biased towards the data they are trained on.
We propose a single-frame approach for source-free, unsupervised domain adaptation of lidar-based 3D object detectors.
arXiv Detail & Related papers (2021-11-30T18:42:42Z) - Noise-Resistant Deep Metric Learning with Probabilistic Instance
Filtering [59.286567680389766]
Noisy labels are commonly found in real-world data, which cause performance degradation of deep neural networks.
We propose Probabilistic Ranking-based Instance Selection with Memory (PRISM) approach for DML.
PRISM calculates the probability of a label being clean, and filters out potentially noisy samples.
arXiv Detail & Related papers (2021-08-03T12:15:25Z) - SRLF: A Stance-aware Reinforcement Learning Framework for Content-based
Rumor Detection on Social Media [15.985224010346593]
Early content-based methods focused on finding clues from text and user profiles for rumor detection.
Recent studies combine the stances of users' comments with news content to capture the difference between true and false rumors.
We propose a novel Stance-aware Reinforcement Learning Framework (SRLF) to select high-quality labeled stance data for model training and rumor detection.
arXiv Detail & Related papers (2021-05-10T03:58:34Z) - Domain Adaptative Causality Encoder [52.779274858332656]
We leverage the characteristics of dependency trees and adversarial learning to address the tasks of adaptive causality identification and localisation.
We present a new causality dataset, namely MedCaus, which integrates all types of causality in the text.
arXiv Detail & Related papers (2020-11-27T04:14:55Z) - Improving Face Recognition by Clustering Unlabeled Faces in the Wild [77.48677160252198]
We propose a novel identity separation method based on extreme value theory.
It greatly reduces the problems caused by overlapping-identity label noise.
Experiments on both controlled and real settings demonstrate our method's consistent improvements.
arXiv Detail & Related papers (2020-07-14T12:26:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.