Related papers: DialFact: A Benchmark for Fact-Checking in Dialogue

DialFact: A Benchmark for Fact-Checking in Dialogue

URL: http://arxiv.org/abs/2110.08222v1
Date: Fri, 15 Oct 2021 17:34:35 GMT
Title: DialFact: A Benchmark for Fact-Checking in Dialogue
Authors: Prakhar Gupta, Chien-Sheng Wu, Wenhao Liu and Caiming Xiong
Abstract summary: We construct DialFact, a benchmark dataset of 22,245 annotated conversational claims, paired with pieces of evidence from Wikipedia. We find that existing fact-checking models trained on non-dialogue data like FEVER fail to perform well on our task. We propose a simple yet data-efficient solution to effectively improve fact-checking performance in dialogue.
Score: 56.63709206232572
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Fact-checking is an essential tool to mitigate the spread of misinformation and disinformation, however, it has been often explored to verify formal single-sentence claims instead of casual conversational claims. To study the problem, we introduce the task of fact-checking in dialogue. We construct DialFact, a testing benchmark dataset of 22,245 annotated conversational claims, paired with pieces of evidence from Wikipedia. There are three sub-tasks in DialFact: 1) Verifiable claim detection task distinguishes whether a response carries verifiable factual information; 2) Evidence retrieval task retrieves the most relevant Wikipedia snippets as evidence; 3) Claim verification task predicts a dialogue response to be supported, refuted, or not enough information. We found that existing fact-checking models trained on non-dialogue data like FEVER fail to perform well on our task, and thus, we propose a simple yet data-efficient solution to effectively improve fact-checking performance in dialogue. We point out unique challenges in DialFact such as handling the colloquialisms, coreferences, and retrieval ambiguities in the error analysis to shed light on future research in this direction.

Related papers

How We Refute Claims: Automatic Fact-Checking through Flaw Identification and Explanation [4.376598435975689]
This paper explores the novel task of flaw-oriented fact-checking, including aspect generation and flaw identification. We also introduce RefuteClaim, a new framework designed specifically for this task. Given the absence of an existing dataset, we present FlawCheck, a dataset created by extracting and transforming insights from expert reviews into relevant aspects and identified flaws.
arXiv Detail & Related papers (2024-01-27T06:06:16Z)
EX-FEVER: A Dataset for Multi-hop Explainable Fact Verification [22.785622371421876]
We present a pioneering dataset for multi-hop explainable fact verification. With over 60,000 claims involving 2-hop and 3-hop reasoning, each is created by summarizing and modifying information from hyperlinked Wikipedia documents. We demonstrate a novel baseline system on our EX-FEVER dataset, showcasing document retrieval, explanation generation, and claim verification.
arXiv Detail & Related papers (2023-10-15T06:46:15Z)
Give Me More Details: Improving Fact-Checking with Latent Retrieval [58.706972228039604]
Evidence plays a crucial role in automated fact-checking. Existing fact-checking systems either assume the evidence sentences are given or use the search snippets returned by the search engine. We propose to incorporate full text from source documents as evidence and introduce two enriched datasets.
arXiv Detail & Related papers (2023-05-25T15:01:19Z)
GERE: Generative Evidence Retrieval for Fact Verification [57.78768817972026]
We propose GERE, the first system that retrieves evidences in a generative fashion. The experimental results on the FEVER dataset show that GERE achieves significant improvements over the state-of-the-art baselines.
arXiv Detail & Related papers (2022-04-12T03:49:35Z)
Assessing Effectiveness of Using Internal Signals for Check-Worthy Claim Identification in Unlabeled Data for Automated Fact-Checking [6.193231258199234]
This paper explores methodology to identify check-worthy claim sentences from fake news articles. We leverage two internal supervisory signals - headline and the abstractive summary - to rank the sentences. We show that while the headline has more gisting similarity with how a fact-checking website writes a claim, the summary-based pipeline is the most promising for an end-to-end fact-checking system.
arXiv Detail & Related papers (2021-11-02T16:17:20Z)
HoVer: A Dataset for Many-Hop Fact Extraction And Claim Verification [74.66819506353086]
HoVer is a dataset for many-hop evidence extraction and fact verification. It challenges models to extract facts from several Wikipedia articles that are relevant to a claim. Most of the 3/4-hop claims are written in multiple sentences, which adds to the complexity of understanding long-range dependency relations.
arXiv Detail & Related papers (2020-11-05T20:33:11Z)
A Review on Fact Extraction and Verification [19.373340472113703]
We study the fact checking problem, which aims to identify the veracity of a given claim. We focus on the task of Fact Extraction and VERification (FEVER) and its accompanied dataset. This task is essential and can be the building block of applications such as fake news detection and medical claim verification.
arXiv Detail & Related papers (2020-10-06T20:05:43Z)
Generating Fact Checking Explanations [52.879658637466605]
A crucial piece of the puzzle that is still missing is to understand how to automate the most elaborate part of the process. This paper provides the first study of how these explanations can be generated automatically based on available claim context. Our results indicate that optimising both objectives at the same time, rather than training them separately, improves the performance of a fact checking system.
arXiv Detail & Related papers (2020-04-13T05:23:25Z)
Claim Check-Worthiness Detection as Positive Unlabelled Learning [53.24606510691877]
Claim check-worthiness detection is a critical component of fact checking systems. We illuminate a central challenge in claim check-worthiness detection underlying all of these tasks. Our best performing method is a unified approach which automatically corrects for this using a variant of positive unlabelled learning.
arXiv Detail & Related papers (2020-03-05T16:06:07Z)

This list is automatically generated from the titles and abstracts of the papers in this site.