AVeriTeC: A Dataset for Real-world Claim Verification with Evidence from
the Web
- URL: http://arxiv.org/abs/2305.13117v3
- Date: Wed, 8 Nov 2023 11:53:55 GMT
- Title: AVeriTeC: A Dataset for Real-world Claim Verification with Evidence from
the Web
- Authors: Michael Schlichtkrull, Zhijiang Guo, Andreas Vlachos
- Abstract summary: We introduce AVeriTeC, a new dataset of 4,568 real-world claims covering fact-checks by 50 different organizations.
Each claim is annotated with question-answer pairs supported by evidence available online, as well as textual justifications explaining how the evidence combines to produce a verdict.
- Score: 20.576644330553744
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Existing datasets for automated fact-checking have substantial limitations,
such as relying on artificial claims, lacking annotations for evidence and
intermediate reasoning, or including evidence published after the claim. In
this paper we introduce AVeriTeC, a new dataset of 4,568 real-world claims
covering fact-checks by 50 different organizations. Each claim is annotated
with question-answer pairs supported by evidence available online, as well as
textual justifications explaining how the evidence combines to produce a
verdict. Through a multi-round annotation process, we avoid common pitfalls
including context dependence, evidence insufficiency, and temporal leakage, and
reach a substantial inter-annotator agreement of $\kappa=0.619$ on verdicts. We
develop a baseline as well as an evaluation scheme for verifying claims through
several question-answering steps against the open web.
Related papers
- Contrastive Learning to Improve Retrieval for Real-world Fact Checking [84.57583869042791]
We present Contrastive Fact-Checking Reranker (CFR), an improved retriever for fact-checking complex claims.
We leverage the AVeriTeC dataset, which annotates subquestions for claims with human written answers from evidence documents.
We find a 6% improvement in veracity classification accuracy on the dataset.
arXiv Detail & Related papers (2024-10-07T00:09:50Z) - Give Me More Details: Improving Fact-Checking with Latent Retrieval [58.706972228039604]
Evidence plays a crucial role in automated fact-checking.
Existing fact-checking systems either assume the evidence sentences are given or use the search snippets returned by the search engine.
We propose to incorporate full text from source documents as evidence and introduce two enriched datasets.
arXiv Detail & Related papers (2023-05-25T15:01:19Z) - Complex Claim Verification with Evidence Retrieved in the Wild [73.19998942259073]
We present the first fully automated pipeline to check real-world claims by retrieving raw evidence from the web.
Our pipeline includes five components: claim decomposition, raw document retrieval, fine-grained evidence retrieval, claim-focused summarization, and veracity judgment.
arXiv Detail & Related papers (2023-05-19T17:49:19Z) - WiCE: Real-World Entailment for Claims in Wikipedia [63.234352061821625]
We propose WiCE, a new fine-grained textual entailment dataset built on natural claim and evidence pairs extracted from Wikipedia.
In addition to standard claim-level entailment, WiCE provides entailment judgments over sub-sentence units of the claim.
We show that real claims in our dataset involve challenging verification and retrieval problems that existing models fail to address.
arXiv Detail & Related papers (2023-03-02T17:45:32Z) - Generating Literal and Implied Subquestions to Fact-check Complex Claims [64.81832149826035]
We focus on decomposing a complex claim into a comprehensive set of yes-no subquestions whose answers influence the veracity of the claim.
We present ClaimDecomp, a dataset of decompositions for over 1000 claims.
We show that these subquestions can help identify relevant evidence to fact-check the full claim and derive the veracity through their answers.
arXiv Detail & Related papers (2022-05-14T00:40:57Z) - COVID-Fact: Fact Extraction and Verification of Real-World Claims on
COVID-19 Pandemic [12.078052727772718]
We introduce a FEVER-like dataset COVID-Fact of $4,086$ claims concerning the COVID-19 pandemic.
The dataset contains claims, evidence for the claims, and contradictory claims refuted by the evidence.
arXiv Detail & Related papers (2021-06-07T16:59:46Z) - Topic-Aware Evidence Reasoning and Stance-Aware Aggregation for Fact
Verification [19.130541561303293]
We propose a novel topic-aware evidence reasoning and stance-aware aggregation model for fact verification.
Tests conducted on two benchmark datasets demonstrate the superiority of the proposed model over several state-of-the-art approaches for fact verification.
arXiv Detail & Related papers (2021-06-02T14:33:12Z) - AmbiFC: Fact-Checking Ambiguous Claims with Evidence [57.7091560922174]
We present AmbiFC, a fact-checking dataset with 10k claims derived from real-world information needs.
We analyze disagreements arising from ambiguity when comparing claims against evidence in AmbiFC.
We develop models for predicting veracity handling this ambiguity via soft labels.
arXiv Detail & Related papers (2021-04-01T17:40:08Z) - Hierarchical Evidence Set Modeling for Automated Fact Extraction and
Verification [5.836068916903788]
Hierarchical Evidence Set Modeling (HESM) is a framework to extract evidence sets and verify a claim to be supported, refuted or not enough info.
Our experimental results show that HESM outperforms 7 state-of-the-art methods for fact extraction and claim verification.
arXiv Detail & Related papers (2020-10-10T22:27:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.