It's about Time: Rethinking Evaluation on Rumor Detection Benchmarks
using Chronological Splits
- URL: http://arxiv.org/abs/2302.03147v1
- Date: Mon, 6 Feb 2023 22:53:13 GMT
- Title: It's about Time: Rethinking Evaluation on Rumor Detection Benchmarks
using Chronological Splits
- Authors: Yida Mu and Kalina Bontcheva and Nikolaos Aletras
- Abstract summary: We provide a re-evaluation of classification models on four popular rumor detection benchmarks considering chronological instead of random splits.
Our experimental results show that the use of random splits can significantly overestimate predictive performance across all datasets and models.
We suggest that rumor detection models should always be evaluated using chronological splits for minimizing topical overlaps.
- Score: 27.061515030101972
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: New events emerge over time influencing the topics of rumors in social media.
Current rumor detection benchmarks use random splits as training, development
and test sets which typically results in topical overlaps. Consequently, models
trained on random splits may not perform well on rumor classification on
previously unseen topics due to the temporal concept drift. In this paper, we
provide a re-evaluation of classification models on four popular rumor
detection benchmarks considering chronological instead of random splits. Our
experimental results show that the use of random splits can significantly
overestimate predictive performance across all datasets and models. Therefore,
we suggest that rumor detection models should always be evaluated using
chronological splits for minimizing topical overlaps.
Related papers
- TimeSiam: A Pre-Training Framework for Siamese Time-Series Modeling [67.02157180089573]
Time series pre-training has recently garnered wide attention for its potential to reduce labeling expenses and benefit various downstream tasks.
This paper proposes TimeSiam as a simple but effective self-supervised pre-training framework for Time series based on Siamese networks.
arXiv Detail & Related papers (2024-02-04T13:10:51Z) - Noisy Correspondence Learning with Self-Reinforcing Errors Mitigation [63.180725016463974]
Cross-modal retrieval relies on well-matched large-scale datasets that are laborious in practice.
We introduce a novel noisy correspondence learning framework, namely textbfSelf-textbfReinforcing textbfErrors textbfMitigation (SREM)
arXiv Detail & Related papers (2023-12-27T09:03:43Z) - Latent Feature-based Data Splits to Improve Generalisation Evaluation: A
Hate Speech Detection Case Study [33.1099258648462]
We present two split variants that reveal how models catastrophically fail on blind spots in the latent space.
Our analysis suggests that there is no clear surface-level property of the data split that correlates with the decreased performance.
arXiv Detail & Related papers (2023-11-16T23:49:55Z) - Lexical Repetitions Lead to Rote Learning: Unveiling the Impact of
Lexical Overlap in Train and Test Reference Summaries [131.80860903537172]
Ideal summarization models should generalize to novel summary-worthy content without remembering reference training summaries by rote.
We propose a fine-grained evaluation protocol by partitioning a test set based on the lexical similarity of reference test summaries with training summaries.
arXiv Detail & Related papers (2023-11-15T23:47:53Z) - Examining the Limitations of Computational Rumor Detection Models Trained on Static Datasets [30.315424983805087]
This paper is in-depth evaluation of the performance gap between content and context-based models.
Our empirical findings demonstrate that context-based models are still overly dependent on the information derived from the rumors' source post.
Based on our experimental results, the paper also offers practical suggestions on how to minimize the effects of temporal concept drift in static datasets.
arXiv Detail & Related papers (2023-09-20T18:27:19Z) - MomentDiff: Generative Video Moment Retrieval from Random to Real [71.40038773943638]
We provide a generative diffusion-based framework called MomentDiff.
MomentDiff simulates a typical human retrieval process from random browsing to gradual localization.
We show that MomentDiff consistently outperforms state-of-the-art methods on three public benchmarks.
arXiv Detail & Related papers (2023-07-06T09:12:13Z) - Distributionally Robust Models with Parametric Likelihood Ratios [123.05074253513935]
Three simple ideas allow us to train models with DRO using a broader class of parametric likelihood ratios.
We find that models trained with the resulting parametric adversaries are consistently more robust to subpopulation shifts when compared to other DRO approaches.
arXiv Detail & Related papers (2022-04-13T12:43:12Z) - A Closer Look at Debiased Temporal Sentence Grounding in Videos:
Dataset, Metric, and Approach [53.727460222955266]
Temporal Sentence Grounding in Videos (TSGV) aims to ground a natural language sentence in an untrimmed video.
Recent studies have found that current benchmark datasets may have obvious moment annotation biases.
We introduce a new evaluation metric "dR@n,IoU@m" that discounts the basic recall scores to alleviate the inflating evaluation caused by biased datasets.
arXiv Detail & Related papers (2022-03-10T08:58:18Z) - Learning Sample Importance for Cross-Scenario Video Temporal Grounding [30.82619216537177]
The paper investigates some superficial biases specific to the temporal grounding task.
We propose a novel method called Debiased Temporal Language Localizer (DebiasTLL) to prevent the model from naively memorizing the biases.
We evaluate the proposed model in cross-scenario temporal grounding, where the train / test data are heterogeneously sourced.
arXiv Detail & Related papers (2022-01-08T15:41:38Z) - Evaluation of Local Explanation Methods for Multivariate Time Series
Forecasting [0.21094707683348418]
Local interpretability is important in determining why a model makes particular predictions.
Despite the recent focus on AI interpretability, there has been a lack of research in local interpretability methods for time series forecasting.
arXiv Detail & Related papers (2020-09-18T21:15:28Z) - We Need to Talk About Random Splits [3.236124102160291]
Gorman and Bedrick argued for using random splits rather than standard splits in NLP experiments.
We argue that random splits, like standard splits, lead to overly optimistic performance estimates.
arXiv Detail & Related papers (2020-05-01T22:14:16Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.