Related papers: WiCE: Real-World Entailment for Claims in Wikipedia

WiCE: Real-World Entailment for Claims in Wikipedia

URL: http://arxiv.org/abs/2303.01432v2
Date: Sun, 22 Oct 2023 18:11:08 GMT
Title: WiCE: Real-World Entailment for Claims in Wikipedia
Authors: Ryo Kamoi, Tanya Goyal, Juan Diego Rodriguez, Greg Durrett
Abstract summary: We propose WiCE, a new fine-grained textual entailment dataset built on natural claim and evidence pairs extracted from Wikipedia. In addition to standard claim-level entailment, WiCE provides entailment judgments over sub-sentence units of the claim. We show that real claims in our dataset involve challenging verification and retrieval problems that existing models fail to address.
Score: 63.234352061821625
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Textual entailment models are increasingly applied in settings like fact-checking, presupposition verification in question answering, or summary evaluation. However, these represent a significant domain shift from existing entailment datasets, and models underperform as a result. We propose WiCE, a new fine-grained textual entailment dataset built on natural claim and evidence pairs extracted from Wikipedia. In addition to standard claim-level entailment, WiCE provides entailment judgments over sub-sentence units of the claim, and a minimal subset of evidence sentences that support each subclaim. To support this, we propose an automatic claim decomposition strategy using GPT-3.5 which we show is also effective at improving entailment models' performance on multiple datasets at test time. Finally, we show that real claims in our dataset involve challenging verification and retrieval problems that existing models fail to address.

Related papers

When Claims Evolve: Evaluating and Enhancing the Robustness of Embedding Models Against Misinformation Edits [5.443263983810103]
As users interact with claims online, they often introduce edits, and it remains unclear whether current embedding models are robust to such edits.<n>We introduce a perturbation framework that generates valid and natural claim variations, enabling us to assess the robustness of a wide-range of sentence embedding models.<n>Our evaluation reveals that standard embedding models exhibit notable performance drops on edited claims, while LLM-distilled embedding models offer improved robustness at a higher computational cost.
arXiv Detail & Related papers (2025-03-05T11:47:32Z)
FactIR: A Real-World Zero-shot Open-Domain Retrieval Benchmark for Fact-Checking [3.1537425078180625]
The field of automated fact-checking increasingly depends on retrieving web-based evidence to determine the veracity of claims in real-world scenarios. Traditional retrieval methods may return documents that directly address claims or lean toward supporting them, but often struggle with more complex claims requiring indirect reasoning. We present a real-world benchmark FactIR, derived from Factiverse production logs, enhanced with human annotations.
arXiv Detail & Related papers (2025-02-09T19:51:00Z)
FactLens: Benchmarking Fine-Grained Fact Verification [6.814173254027381]
We advocate for a shift toward fine-grained verification, where complex claims are broken down into smaller sub-claims for individual verification. We introduce FactLens, a benchmark for evaluating fine-grained fact verification, with metrics and automated evaluators of sub-claim quality. Our results show alignment between automated FactLens evaluators and human judgments, and we discuss the impact of sub-claim characteristics on the overall verification performance.
arXiv Detail & Related papers (2024-11-08T21:26:57Z)
Contrastive Learning to Improve Retrieval for Real-world Fact Checking [84.57583869042791]
We present Contrastive Fact-Checking Reranker (CFR), an improved retriever for fact-checking complex claims. We leverage the AVeriTeC dataset, which annotates subquestions for claims with human written answers from evidence documents. We find a 6% improvement in veracity classification accuracy on the dataset.
arXiv Detail & Related papers (2024-10-07T00:09:50Z)
From Chaos to Clarity: Claim Normalization to Empower Fact-Checking [57.024192702939736]
Claim Normalization (aka ClaimNorm) aims to decompose complex and noisy social media posts into more straightforward and understandable forms. We propose CACN, a pioneering approach that leverages chain-of-thought and claim check-worthiness estimation. Our experiments demonstrate that CACN outperforms several baselines across various evaluation measures.
arXiv Detail & Related papers (2023-10-22T16:07:06Z)
AVeriTeC: A Dataset for Real-world Claim Verification with Evidence from the Web [20.576644330553744]
We introduce AVeriTeC, a new dataset of 4,568 real-world claims covering fact-checks by 50 different organizations. Each claim is annotated with question-answer pairs supported by evidence available online, as well as textual justifications explaining how the evidence combines to produce a verdict.
arXiv Detail & Related papers (2023-05-22T15:17:18Z)
Questioning the Validity of Summarization Datasets and Improving Their Factual Consistency [14.974996886744083]
We release SummFC, a filtered summarization dataset with improved factual consistency. We argue that our dataset should become a valid benchmark for developing and evaluating summarization systems.
arXiv Detail & Related papers (2022-10-31T15:04:20Z)
Generating Literal and Implied Subquestions to Fact-check Complex Claims [64.81832149826035]
We focus on decomposing a complex claim into a comprehensive set of yes-no subquestions whose answers influence the veracity of the claim. We present ClaimDecomp, a dataset of decompositions for over 1000 claims. We show that these subquestions can help identify relevant evidence to fact-check the full claim and derive the veracity through their answers.
arXiv Detail & Related papers (2022-05-14T00:40:57Z)
GERE: Generative Evidence Retrieval for Fact Verification [57.78768817972026]
We propose GERE, the first system that retrieves evidences in a generative fashion. The experimental results on the FEVER dataset show that GERE achieves significant improvements over the state-of-the-art baselines.
arXiv Detail & Related papers (2022-04-12T03:49:35Z)
Zero-shot Fact Verification by Claim Generation [85.27523983027471]
We develop QACG, a framework for training a robust fact verification model. We use automatically generated claims that can be supported, refuted, or unverifiable from evidence from Wikipedia. In a zero-shot scenario, QACG improves a RoBERTa model's F1 from 50% to 77%, equivalent in performance to 2K+ manually-curated examples.
arXiv Detail & Related papers (2021-05-31T03:13:52Z)
Hierarchical Evidence Set Modeling for Automated Fact Extraction and Verification [5.836068916903788]
Hierarchical Evidence Set Modeling (HESM) is a framework to extract evidence sets and verify a claim to be supported, refuted or not enough info. Our experimental results show that HESM outperforms 7 state-of-the-art methods for fact extraction and claim verification.
arXiv Detail & Related papers (2020-10-10T22:27:17Z)

This list is automatically generated from the titles and abstracts of the papers in this site.