FaVIQ: FAct Verification from Information-seeking Questions
- URL: http://arxiv.org/abs/2107.02153v1
- Date: Mon, 5 Jul 2021 17:31:44 GMT
- Title: FaVIQ: FAct Verification from Information-seeking Questions
- Authors: Jungsoo Park, Sewon Min, Jaewoo Kang, Luke Zettlemoyer, Hannaneh
Hajishirzi
- Abstract summary: We construct a large-scale fact verification dataset called FaVIQ using information-seeking questions posed by real users.
Our claims are verified to be natural, contain little lexical bias, and require a complete understanding of the evidence for verification.
- Score: 77.7067957445298
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Despite significant interest in developing general purpose fact checking
models, it is challenging to construct a large-scale fact verification dataset
with realistic claims that would occur in the real world. Existing claims are
either authored by crowdworkers, thereby introducing subtle biases that are
difficult to control for, or manually verified by professional fact checkers,
causing them to be expensive and limited in scale. In this paper, we construct
a challenging, realistic, and large-scale fact verification dataset called
FaVIQ, using information-seeking questions posed by real users who do not know
how to answer. The ambiguity in information-seeking questions enables
automatically constructing true and false claims that reflect confusions arisen
from users (e.g., the year of the movie being filmed vs. being released). Our
claims are verified to be natural, contain little lexical bias, and require a
complete understanding of the evidence for verification. Our experiments show
that the state-of-the-art models are far from solving our new task. Moreover,
training on our data helps in professional fact-checking, outperforming models
trained on the most widely used dataset FEVER or in-domain data by up to 17%
absolute. Altogether, our data will serve as a challenging benchmark for
natural language understanding and support future progress in professional fact
checking.
Related papers
- Contrastive Learning to Improve Retrieval for Real-world Fact Checking [84.57583869042791]
We present Contrastive Fact-Checking Reranker (CFR), an improved retriever for fact-checking complex claims.
We leverage the AVeriTeC dataset, which annotates subquestions for claims with human written answers from evidence documents.
We find a 6% improvement in veracity classification accuracy on the dataset.
arXiv Detail & Related papers (2024-10-07T00:09:50Z) - Fact or Fiction? Improving Fact Verification with Knowledge Graphs through Simplified Subgraph Retrievals [0.0]
We present efficient methods for verifying claims on a dataset where the evidence is in the form of structured knowledge graphs.
By simplifying the evidence retrieval process, we are able to construct models that both require less computational resources and achieve better test-set accuracy.
arXiv Detail & Related papers (2024-08-14T10:46:15Z) - How We Refute Claims: Automatic Fact-Checking through Flaw
Identification and Explanation [4.376598435975689]
This paper explores the novel task of flaw-oriented fact-checking, including aspect generation and flaw identification.
We also introduce RefuteClaim, a new framework designed specifically for this task.
Given the absence of an existing dataset, we present FlawCheck, a dataset created by extracting and transforming insights from expert reviews into relevant aspects and identified flaws.
arXiv Detail & Related papers (2024-01-27T06:06:16Z) - EX-FEVER: A Dataset for Multi-hop Explainable Fact Verification [22.785622371421876]
We present a pioneering dataset for multi-hop explainable fact verification.
With over 60,000 claims involving 2-hop and 3-hop reasoning, each is created by summarizing and modifying information from hyperlinked Wikipedia documents.
We demonstrate a novel baseline system on our EX-FEVER dataset, showcasing document retrieval, explanation generation, and claim verification.
arXiv Detail & Related papers (2023-10-15T06:46:15Z) - Mitigating Temporal Misalignment by Discarding Outdated Facts [58.620269228776294]
Large language models are often used under temporal misalignment, tasked with answering questions about the present.
We propose fact duration prediction: the task of predicting how long a given fact will remain true.
Our data and code are released publicly at https://github.com/mikejqzhang/mitigating_misalignment.
arXiv Detail & Related papers (2023-05-24T07:30:08Z) - WiCE: Real-World Entailment for Claims in Wikipedia [63.234352061821625]
We propose WiCE, a new fine-grained textual entailment dataset built on natural claim and evidence pairs extracted from Wikipedia.
In addition to standard claim-level entailment, WiCE provides entailment judgments over sub-sentence units of the claim.
We show that real claims in our dataset involve challenging verification and retrieval problems that existing models fail to address.
arXiv Detail & Related papers (2023-03-02T17:45:32Z) - Missing Counter-Evidence Renders NLP Fact-Checking Unrealistic for
Misinformation [67.69725605939315]
Misinformation emerges in times of uncertainty when credible information is limited.
This is challenging for NLP-based fact-checking as it relies on counter-evidence, which may not yet be available.
arXiv Detail & Related papers (2022-10-25T09:40:48Z) - Generating Literal and Implied Subquestions to Fact-check Complex Claims [64.81832149826035]
We focus on decomposing a complex claim into a comprehensive set of yes-no subquestions whose answers influence the veracity of the claim.
We present ClaimDecomp, a dataset of decompositions for over 1000 claims.
We show that these subquestions can help identify relevant evidence to fact-check the full claim and derive the veracity through their answers.
arXiv Detail & Related papers (2022-05-14T00:40:57Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.