FACTors: A New Dataset for Studying the Fact-checking Ecosystem
- URL: http://arxiv.org/abs/2505.09414v1
- Date: Wed, 14 May 2025 14:10:22 GMT
- Title: FACTors: A New Dataset for Studying the Fact-checking Ecosystem
- Authors: Enes Altuncu, Can Başkent, Sanjay Bhattacherjee, Shujun Li, Dwaipayan Roy,
- Abstract summary: There is no fact-checking dataset at the ecosystem level, covering claims from a sufficiently long period of time.<n>We present a new dataset FACTors, the first to fill this gap by presenting ecosystem-level data on fact-checking.<n>Our methods for constructing FACTors are generic and can be used to maintain a live dataset that can be updated dynamically.
- Score: 2.994926985221349
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Our fight against false information is spearheaded by fact-checkers. They investigate the veracity of claims and document their findings as fact-checking reports. With the rapid increase in the amount of false information circulating online, the use of automation in fact-checking processes aims to strengthen this ecosystem by enhancing scalability. Datasets containing fact-checked claims play a key role in developing such automated solutions. However, to the best of our knowledge, there is no fact-checking dataset at the ecosystem level, covering claims from a sufficiently long period of time and sourced from a wide range of actors reflecting the entire ecosystem that admittedly follows widely-accepted codes and principles of fact-checking. We present a new dataset FACTors, the first to fill this gap by presenting ecosystem-level data on fact-checking. It contains 118,112 claims from 117,993 fact-checking reports in English (co-)authored by 1,953 individuals and published during the period of 1995-2025 by 39 fact-checking organisations that are active signatories of the IFCN (International Fact-Checking Network) and/or EFCSN (European Fact-Checking Standards Network). It contains 7,327 overlapping claims investigated by multiple fact-checking organisations, corresponding to 2,977 unique claims. It allows to conduct new ecosystem-level studies of the fact-checkers (organisations and individuals). To demonstrate the usefulness of FACTors, we present three example applications, including a first-of-its-kind statistical analysis of the fact-checking ecosystem, examining the political inclinations of the fact-checking organisations, and attempting to assign a credibility score to each organisation based on the findings of the statistical analysis and political leanings. Our methods for constructing FACTors are generic and can be used to maintain a live dataset that can be updated dynamically.
Related papers
- Task-Oriented Automatic Fact-Checking with Frame-Semantics [0.8135825089247968]
We introduce a pilot dataset of real-world claims extracted from PolitiFact, annotated for large-scale structured data.<n>This dataset underpins two case studies: the first investigates voting-related claims using the Vote semantic frame, while the second explores various semantic frames based on data sources from the Organisation for Economic Co-operation and Development.<n>Our findings demonstrate the effectiveness of frame semantics in improving evidence retrieval and explainability for fact-checking.
arXiv Detail & Related papers (2025-01-23T00:26:09Z) - TrendFact: A Benchmark for Explainable Hotspot Perception in Fact-Checking with Natural Language Explanation [10.449165630417522]
We introduce TrendFact, the first hotspot perception fact-checking benchmark.<n> TrendFact consists of 7,643 carefully curated samples sourced from trending platforms and professional fact-checking datasets.<n>We propose FactISR, which integrates dynamic evidence augmentation, evidence triangulation, and an iterative self-reflection mechanism.
arXiv Detail & Related papers (2024-10-19T15:25:19Z) - Missing Counter-Evidence Renders NLP Fact-Checking Unrealistic for
Misinformation [67.69725605939315]
Misinformation emerges in times of uncertainty when credible information is limited.
This is challenging for NLP-based fact-checking as it relies on counter-evidence, which may not yet be available.
arXiv Detail & Related papers (2022-10-25T09:40:48Z) - CHEF: A Pilot Chinese Dataset for Evidence-Based Fact-Checking [55.75590135151682]
CHEF is the first CHinese Evidence-based Fact-checking dataset of 10K real-world claims.
The dataset covers multiple domains, ranging from politics to public health, and provides annotated evidence retrieved from the Internet.
arXiv Detail & Related papers (2022-06-06T09:11:03Z) - Synthetic Disinformation Attacks on Automated Fact Verification Systems [53.011635547834025]
We explore the sensitivity of automated fact-checkers to synthetic adversarial evidence in two simulated settings.
We show that these systems suffer significant performance drops against these attacks.
We discuss the growing threat of modern NLG systems as generators of disinformation.
arXiv Detail & Related papers (2022-02-18T19:01:01Z) - FacTeR-Check: Semi-automated fact-checking through Semantic Similarity
and Natural Language Inference [61.068947982746224]
FacTeR-Check enables retrieving fact-checked information, unchecked claims verification and tracking dangerous information over social media.
The architecture is validated using a new dataset called NLI19-SP that is publicly released with COVID-19 related hoaxes and tweets from Spanish social media.
Our results show state-of-the-art performance on the individual benchmarks, as well as producing useful analysis of the evolution over time of 61 different hoaxes.
arXiv Detail & Related papers (2021-10-27T15:44:54Z) - Automated Fact-Checking: A Survey [5.729426778193398]
Researchers in the field of Natural Language Processing (NLP) have contributed to the task by building fact-checking datasets.
This paper reviews relevant research on automated fact-checking covering both the claim detection and claim validation components.
arXiv Detail & Related papers (2021-09-23T15:13:48Z) - FaVIQ: FAct Verification from Information-seeking Questions [77.7067957445298]
We construct a large-scale fact verification dataset called FaVIQ using information-seeking questions posed by real users.
Our claims are verified to be natural, contain little lexical bias, and require a complete understanding of the evidence for verification.
arXiv Detail & Related papers (2021-07-05T17:31:44Z) - AmbiFC: Fact-Checking Ambiguous Claims with Evidence [57.7091560922174]
We present AmbiFC, a fact-checking dataset with 10k claims derived from real-world information needs.
We analyze disagreements arising from ambiguity when comparing claims against evidence in AmbiFC.
We develop models for predicting veracity handling this ambiguity via soft labels.
arXiv Detail & Related papers (2021-04-01T17:40:08Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.