FEVEROUS: Fact Extraction and VERification Over Unstructured and
Structured information
- URL: http://arxiv.org/abs/2106.05707v1
- Date: Thu, 10 Jun 2021 12:47:36 GMT
- Title: FEVEROUS: Fact Extraction and VERification Over Unstructured and
Structured information
- Authors: Rami Aly, Zhijiang Guo, Michael Schlichtkrull, James Thorne, Andreas
Vlachos, Christos Christodoulopoulos, Oana Cocarascu, Arpit Mittal
- Abstract summary: We introduce a novel dataset and benchmark, Fact Extraction and VERification Over Unstructured and Structured information (FEVEROUS), which consists of 87,026 verified claims.
Each claim is annotated with evidence in the form of sentences and/or cells from tables in Wikipedia, as well as a label indicating whether this evidence supports, refutes, or does not provide enough information to reach a verdict.
We develop a baseline for verifying claims against text and tables which predicts both the correct evidence and verdict for 18% of the claims.
- Score: 21.644199631998482
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: Fact verification has attracted a lot of attention in the machine learning
and natural language processing communities, as it is one of the key methods
for detecting misinformation. Existing large-scale benchmarks for this task
have focused mostly on textual sources, i.e. unstructured information, and thus
ignored the wealth of information available in structured formats, such as
tables. In this paper we introduce a novel dataset and benchmark, Fact
Extraction and VERification Over Unstructured and Structured information
(FEVEROUS), which consists of 87,026 verified claims. Each claim is annotated
with evidence in the form of sentences and/or cells from tables in Wikipedia,
as well as a label indicating whether this evidence supports, refutes, or does
not provide enough information to reach a verdict. Furthermore, we detail our
efforts to track and minimize the biases present in the dataset and could be
exploited by models, e.g. being able to predict the label without using
evidence. Finally, we develop a baseline for verifying claims against text and
tables which predicts both the correct evidence and verdict for 18% of the
claims.
Related papers
- ClaimVer: Explainable Claim-Level Verification and Evidence Attribution of Text Through Knowledge Graphs [13.608282497568108]
ClaimVer is a human-centric framework tailored to meet users' informational and verification needs.
It highlights each claim, verifies it against a trusted knowledge graph, and provides succinct, clear explanations for each claim prediction.
arXiv Detail & Related papers (2024-03-12T17:07:53Z) - Heterogeneous Graph Reasoning for Fact Checking over Texts and Tables [22.18384189336634]
HeterFC is a word-level Heterogeneous-graph-based model for Fact Checking over unstructured and structured information.
We perform information propagation via a relational graph neural network, interactions between claims and evidence.
We introduce a multitask loss function to account for potential inaccuracies in evidence retrieval.
arXiv Detail & Related papers (2024-02-20T14:10:40Z) - EX-FEVER: A Dataset for Multi-hop Explainable Fact Verification [22.785622371421876]
We present a pioneering dataset for multi-hop explainable fact verification.
With over 60,000 claims involving 2-hop and 3-hop reasoning, each is created by summarizing and modifying information from hyperlinked Wikipedia documents.
We demonstrate a novel baseline system on our EX-FEVER dataset, showcasing document retrieval, explanation generation, and claim verification.
arXiv Detail & Related papers (2023-10-15T06:46:15Z) - Enhancing Argument Structure Extraction with Efficient Leverage of
Contextual Information [79.06082391992545]
We propose an Efficient Context-aware model (ECASE) that fully exploits contextual information.
We introduce a sequence-attention module and distance-weighted similarity loss to aggregate contextual information and argumentative information.
Our experiments on five datasets from various domains demonstrate that our model achieves state-of-the-art performance.
arXiv Detail & Related papers (2023-10-08T08:47:10Z) - Give Me More Details: Improving Fact-Checking with Latent Retrieval [58.706972228039604]
Evidence plays a crucial role in automated fact-checking.
Existing fact-checking systems either assume the evidence sentences are given or use the search snippets returned by the search engine.
We propose to incorporate full text from source documents as evidence and introduce two enriched datasets.
arXiv Detail & Related papers (2023-05-25T15:01:19Z) - WiCE: Real-World Entailment for Claims in Wikipedia [63.234352061821625]
We propose WiCE, a new fine-grained textual entailment dataset built on natural claim and evidence pairs extracted from Wikipedia.
In addition to standard claim-level entailment, WiCE provides entailment judgments over sub-sentence units of the claim.
We show that real claims in our dataset involve challenging verification and retrieval problems that existing models fail to address.
arXiv Detail & Related papers (2023-03-02T17:45:32Z) - CHEF: A Pilot Chinese Dataset for Evidence-Based Fact-Checking [55.75590135151682]
CHEF is the first CHinese Evidence-based Fact-checking dataset of 10K real-world claims.
The dataset covers multiple domains, ranging from politics to public health, and provides annotated evidence retrieved from the Internet.
arXiv Detail & Related papers (2022-06-06T09:11:03Z) - End-to-End Multimodal Fact-Checking and Explanation Generation: A
Challenging Dataset and Models [0.0]
We propose end-to-end multimodal fact-checking and explanation generation.
The goal is to assess the truthfulness of a claim by retrieving relevant evidence and predicting a truthfulness label.
To support this research, we construct Mocheg, a large-scale dataset consisting of 15,601 claims.
arXiv Detail & Related papers (2022-05-25T04:36:46Z) - Generating Fact Checking Summaries for Web Claims [8.980876474818153]
We present a neural attention-based approach that learns to establish the correctness of textual claims based on evidence in the form of text documents.
We show the efficacy of our approach on datasets concerning political, healthcare, and environmental issues.
arXiv Detail & Related papers (2020-10-16T18:10:47Z) - ToTTo: A Controlled Table-To-Text Generation Dataset [61.83159452483026]
ToTTo is an open-domain English table-to-text dataset with over 120,000 training examples.
We introduce a dataset construction process where annotators directly revise existing candidate sentences from Wikipedia.
While usually fluent, existing methods often hallucinate phrases that are not supported by the table.
arXiv Detail & Related papers (2020-04-29T17:53:45Z) - Generating Fact Checking Explanations [52.879658637466605]
A crucial piece of the puzzle that is still missing is to understand how to automate the most elaborate part of the process.
This paper provides the first study of how these explanations can be generated automatically based on available claim context.
Our results indicate that optimising both objectives at the same time, rather than training them separately, improves the performance of a fact checking system.
arXiv Detail & Related papers (2020-04-13T05:23:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.