Related papers: A Review on Fact Extraction and Verification

A Review on Fact Extraction and Verification

URL: http://arxiv.org/abs/2010.03001v5
Date: Fri, 19 Nov 2021 14:42:58 GMT
Title: A Review on Fact Extraction and Verification
Authors: Giannis Bekoulis, Christina Papagiannopoulou, Nikos Deligiannis
Abstract summary: We study the fact checking problem, which aims to identify the veracity of a given claim. We focus on the task of Fact Extraction and VERification (FEVER) and its accompanied dataset. This task is essential and can be the building block of applications such as fake news detection and medical claim verification.
Score: 19.373340472113703
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We study the fact checking problem, which aims to identify the veracity of a given claim. Specifically, we focus on the task of Fact Extraction and VERification (FEVER) and its accompanied dataset. The task consists of the subtasks of retrieving the relevant documents (and sentences) from Wikipedia and validating whether the information in the documents supports or refutes a given claim. This task is essential and can be the building block of applications such as fake news detection and medical claim verification. In this paper, we aim at a better understanding of the challenges of the task by presenting the literature in a structured and comprehensive way. We describe the proposed methods by analyzing the technical perspectives of the different approaches and discussing the performance results on the FEVER dataset, which is the most well-studied and formally structured dataset on the fact extraction and verification task. We also conduct the largest experimental study to date on identifying beneficial loss functions for the sentence retrieval component. Our analysis indicates that sampling negative sentences is important for improving the performance and decreasing the computational complexity. Finally, we describe open issues and future challenges, and we motivate future research in the task.

Related papers

Likelihood as a Performance Gauge for Retrieval-Augmented Generation [78.28197013467157]
We show that likelihoods serve as an effective gauge for language model performance. We propose two methods that use question likelihood as a gauge for selecting and constructing prompts that lead to better performance.
arXiv Detail & Related papers (2024-11-12T13:14:09Z)
Augmenting the Veracity and Explanations of Complex Fact Checking via Iterative Self-Revision with LLMs [10.449165630417522]
We construct two complex fact-checking datasets in the Chinese scenarios: CHEF-EG and TrendFact. These datasets involve complex facts in areas such as health, politics, and society. We propose a unified framework called FactISR to perform mutual feedback between veracity and explanations.
arXiv Detail & Related papers (2024-10-19T15:25:19Z)
Capturing Pertinent Symbolic Features for Enhanced Content-Based Misinformation Detection [0.0]
The detection of misleading content presents a significant hurdle due to its extreme linguistic and domain variability. This paper analyzes the linguistic attributes that characterize this phenomenon and how representative of such features some of the most popular misinformation datasets are. We demonstrate that the appropriate use of pertinent symbolic knowledge in combination with neural language models is helpful in detecting misleading content.
arXiv Detail & Related papers (2024-01-29T16:42:34Z)
How We Refute Claims: Automatic Fact-Checking through Flaw Identification and Explanation [4.376598435975689]
This paper explores the novel task of flaw-oriented fact-checking, including aspect generation and flaw identification. We also introduce RefuteClaim, a new framework designed specifically for this task. Given the absence of an existing dataset, we present FlawCheck, a dataset created by extracting and transforming insights from expert reviews into relevant aspects and identified flaws.
arXiv Detail & Related papers (2024-01-27T06:06:16Z)
Distribution Matching for Multi-Task Learning of Classification Tasks: a Large-Scale Study on Faces & Beyond [62.406687088097605]
Multi-Task Learning (MTL) is a framework, where multiple related tasks are learned jointly and benefit from a shared representation space. We show that MTL can be successful with classification tasks with little, or non-overlapping annotations. We propose a novel approach, where knowledge exchange is enabled between the tasks via distribution matching.
arXiv Detail & Related papers (2024-01-02T14:18:11Z)
Enhancing Argument Structure Extraction with Efficient Leverage of Contextual Information [79.06082391992545]
We propose an Efficient Context-aware model (ECASE) that fully exploits contextual information. We introduce a sequence-attention module and distance-weighted similarity loss to aggregate contextual information and argumentative information. Our experiments on five datasets from various domains demonstrate that our model achieves state-of-the-art performance.
arXiv Detail & Related papers (2023-10-08T08:47:10Z)
Benchmarking the Generation of Fact Checking Explanations [19.363672064425504]
We focus on the generation of justifications (textual explanation of why a claim is classified as either true or false) and benchmark it with novel datasets and advanced baselines. Results show that in justification production summarization benefits from the claim information. Although cross-dataset experiments suffer from performance degradation, a unique model trained on a combination of the two datasets is able to retain style information in an efficient manner.
arXiv Detail & Related papers (2023-08-29T10:40:46Z)
Towards Understanding Omission in Dialogue Summarization [45.932368303107104]
Previous works indicated that omission is a major factor in affecting the quality of summarization. We propose the OLDS dataset, which provides high-quality Omission Labels for Dialogue Summarization.
arXiv Detail & Related papers (2022-11-14T06:56:59Z)
Layout-Aware Information Extraction for Document-Grounded Dialogue: Dataset, Method and Demonstration [75.47708732473586]
We propose a layout-aware document-level Information Extraction dataset, LIE, to facilitate the study of extracting both structural and semantic knowledge from visually rich documents. LIE contains 62k annotations of three extraction tasks from 4,061 pages in product and official documents. Empirical results show that layout is critical for VRD-based extraction, and system demonstration also verifies that the extracted knowledge can help locate the answers that users care about.
arXiv Detail & Related papers (2022-07-14T07:59:45Z)
DialFact: A Benchmark for Fact-Checking in Dialogue [56.63709206232572]
We construct DialFact, a benchmark dataset of 22,245 annotated conversational claims, paired with pieces of evidence from Wikipedia. We find that existing fact-checking models trained on non-dialogue data like FEVER fail to perform well on our task. We propose a simple yet data-efficient solution to effectively improve fact-checking performance in dialogue.
arXiv Detail & Related papers (2021-10-15T17:34:35Z)
Abstract, Rationale, Stance: A Joint Model for Scientific Claim Verification [18.330265729989843]
We propose an approach, named as ARSJoint, that jointly learns the modules for the three tasks with a machine reading comprehension framework. The experimental results on the benchmark dataset SciFact show that our approach outperforms the existing works.
arXiv Detail & Related papers (2021-09-13T10:07:26Z)
Competency Problems: On Finding and Removing Artifacts in Language Data [50.09608320112584]
We argue that for complex language understanding tasks, all simple feature correlations are spurious. We theoretically analyze the difficulty of creating data for competency problems when human bias is taken into account.
arXiv Detail & Related papers (2021-04-17T21:34:10Z)
Generating Fact Checking Explanations [52.879658637466605]
A crucial piece of the puzzle that is still missing is to understand how to automate the most elaborate part of the process. This paper provides the first study of how these explanations can be generated automatically based on available claim context. Our results indicate that optimising both objectives at the same time, rather than training them separately, improves the performance of a fact checking system.
arXiv Detail & Related papers (2020-04-13T05:23:25Z)

This list is automatically generated from the titles and abstracts of the papers in this site.