Robust Information Retrieval for False Claims with Distracting Entities
In Fact Extraction and Verification
- URL: http://arxiv.org/abs/2112.07618v1
- Date: Fri, 10 Dec 2021 17:11:50 GMT
- Title: Robust Information Retrieval for False Claims with Distracting Entities
In Fact Extraction and Verification
- Authors: Mingwen Dong, Christos Christodoulopoulos, Sheng-Min Shih, Xiaofei Ma
- Abstract summary: This paper shows that, compared with true claims, false claims more frequently contain irrelevant entities which can distract evidence retrieval model.
A BERT-based retrieval model made more mistakes in retrieving refuting evidence for false claims than supporting evidence for true claims.
- Score: 2.624734563929267
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Accurate evidence retrieval is essential for automated fact checking. Little
previous research has focused on the differences between true and false claims
and how they affect evidence retrieval. This paper shows that, compared with
true claims, false claims more frequently contain irrelevant entities which can
distract evidence retrieval model. A BERT-based retrieval model made more
mistakes in retrieving refuting evidence for false claims than supporting
evidence for true claims. When tested with adversarial false claims
(synthetically generated) containing irrelevant entities, the recall of the
retrieval model is significantly lower than that for original claims. These
results suggest that the vanilla BERT-based retrieval model is not robust to
irrelevant entities in the false claims. By augmenting the training data with
synthetic false claims containing irrelevant entities, the trained model
achieved higher evidence recall, including that of false claims with irrelevant
entities. In addition, using separate models to retrieve refuting and
supporting evidence and then aggregating them can also increase the evidence
recall, including that of false claims with irrelevant entities. These results
suggest that we can increase the BERT-based retrieval model's robustness to
false claims with irrelevant entities via data augmentation and model ensemble.
Related papers
- Contrastive Learning to Improve Retrieval for Real-world Fact Checking [84.57583869042791]
We present Contrastive Fact-Checking Reranker (CFR), an improved retriever for fact-checking complex claims.
We leverage the AVeriTeC dataset, which annotates subquestions for claims with human written answers from evidence documents.
We find a 6% improvement in veracity classification accuracy on the dataset.
arXiv Detail & Related papers (2024-10-07T00:09:50Z) - Complex Claim Verification with Evidence Retrieved in the Wild [73.19998942259073]
We present the first fully automated pipeline to check real-world claims by retrieving raw evidence from the web.
Our pipeline includes five components: claim decomposition, raw document retrieval, fine-grained evidence retrieval, claim-focused summarization, and veracity judgment.
arXiv Detail & Related papers (2023-05-19T17:49:19Z) - Read it Twice: Towards Faithfully Interpretable Fact Verification by
Revisiting Evidence [59.81749318292707]
We propose a fact verification model named ReRead to retrieve evidence and verify claim.
The proposed system is able to achieve significant improvements upon best-reported models under different settings.
arXiv Detail & Related papers (2023-05-02T03:23:14Z) - WiCE: Real-World Entailment for Claims in Wikipedia [63.234352061821625]
We propose WiCE, a new fine-grained textual entailment dataset built on natural claim and evidence pairs extracted from Wikipedia.
In addition to standard claim-level entailment, WiCE provides entailment judgments over sub-sentence units of the claim.
We show that real claims in our dataset involve challenging verification and retrieval problems that existing models fail to address.
arXiv Detail & Related papers (2023-03-02T17:45:32Z) - Generating Literal and Implied Subquestions to Fact-check Complex Claims [64.81832149826035]
We focus on decomposing a complex claim into a comprehensive set of yes-no subquestions whose answers influence the veracity of the claim.
We present ClaimDecomp, a dataset of decompositions for over 1000 claims.
We show that these subquestions can help identify relevant evidence to fact-check the full claim and derive the veracity through their answers.
arXiv Detail & Related papers (2022-05-14T00:40:57Z) - COVID-Fact: Fact Extraction and Verification of Real-World Claims on
COVID-19 Pandemic [12.078052727772718]
We introduce a FEVER-like dataset COVID-Fact of $4,086$ claims concerning the COVID-19 pandemic.
The dataset contains claims, evidence for the claims, and contradictory claims refuted by the evidence.
arXiv Detail & Related papers (2021-06-07T16:59:46Z) - Topic-Aware Evidence Reasoning and Stance-Aware Aggregation for Fact
Verification [19.130541561303293]
We propose a novel topic-aware evidence reasoning and stance-aware aggregation model for fact verification.
Tests conducted on two benchmark datasets demonstrate the superiority of the proposed model over several state-of-the-art approaches for fact verification.
arXiv Detail & Related papers (2021-06-02T14:33:12Z) - Automatic Fake News Detection: Are Models Learning to Reason? [9.143551270841858]
We investigate the relationship and importance of both claim and evidence.
Surprisingly, we find on political fact checking datasets that most often the highest effectiveness is obtained by utilizing only the evidence.
This highlights an important problem in what constitutes evidence in existing approaches for automatic fake news detection.
arXiv Detail & Related papers (2021-05-17T09:34:03Z) - AmbiFC: Fact-Checking Ambiguous Claims with Evidence [57.7091560922174]
We present AmbiFC, a fact-checking dataset with 10k claims derived from real-world information needs.
We analyze disagreements arising from ambiguity when comparing claims against evidence in AmbiFC.
We develop models for predicting veracity handling this ambiguity via soft labels.
arXiv Detail & Related papers (2021-04-01T17:40:08Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.