Related papers: Neural models for Factual Inconsistency Classification with Explanations

Neural models for Factual Inconsistency Classification with Explanations

URL: http://arxiv.org/abs/2306.08872v1
Date: Thu, 15 Jun 2023 06:06:50 GMT
Title: Neural models for Factual Inconsistency Classification with Explanations
Authors: Tathagata Raha, Mukund Choudhary, Abhinav Menon, Harshit Gupta, KV Aditya Srivatsa, Manish Gupta, Vasudeva Varma
Abstract summary: We leverage existing work in linguistics to define five types of factual inconsistencies. We train neural models to predict inconsistency type with explanations, given a (claim, context) sentence pair. Our proposed methods provide a weighted F1 of 87% for inconsistency type classification across the five classes.
Score: 17.214921274113284
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Factual consistency is one of the most important requirements when editing high quality documents. It is extremely important for automatic text generation systems like summarization, question answering, dialog modeling, and language modeling. Still, automated factual inconsistency detection is rather under-studied. Existing work has focused on (a) finding fake news keeping a knowledge base in context, or (b) detecting broad contradiction (as part of natural language inference literature). However, there has been no work on detecting and explaining types of factual inconsistencies in text, without any knowledge base in context. In this paper, we leverage existing work in linguistics to formally define five types of factual inconsistencies. Based on this categorization, we contribute a novel dataset, FICLE (Factual Inconsistency CLassification with Explanation), with ~8K samples where each sample consists of two sentences (claim and context) annotated with type and span of inconsistency. When the inconsistency relates to an entity type, it is labeled as well at two levels (coarse and fine-grained). Further, we leverage this dataset to train a pipeline of four neural models to predict inconsistency type with explanations, given a (claim, context) sentence pair. Explanations include inconsistent claim fact triple, inconsistent context span, inconsistent claim component, coarse and fine-grained inconsistent entity types. The proposed system first predicts inconsistent spans from claim and context; and then uses them to predict inconsistency types and inconsistent entity types (when inconsistency is due to entities). We experiment with multiple Transformer-based natural language classification as well as generative models, and find that DeBERTa performs the best. Our proposed methods provide a weighted F1 of ~87% for inconsistency type classification across the five classes.

Related papers

QUITE: Quantifying Uncertainty in Natural Language Text in Bayesian Reasoning Scenarios [15.193544498311603]
We present QUITE, a dataset of real-world Bayesian reasoning scenarios with categorical random variables and complex relationships. We conduct an extensive set of experiments, finding that logic-based models outperform out-of-the-box large language models on all reasoning types. Our results provide evidence that neuro-symbolic models are a promising direction for improving complex reasoning.
arXiv Detail & Related papers (2024-10-14T12:44:59Z)
Localizing Factual Inconsistencies in Attributable Text Generation [91.981439746404]
We introduce QASemConsistency, a new formalism for localizing factual inconsistencies in attributable text generation. We first demonstrate the effectiveness of the QASemConsistency methodology for human annotation. We then implement several methods for automatically detecting localized factual inconsistencies.
arXiv Detail & Related papers (2024-10-09T22:53:48Z)
Interpretable Automatic Fine-grained Inconsistency Detection in Text Summarization [56.94741578760294]
We propose the task of fine-grained inconsistency detection, the goal of which is to predict the fine-grained types of factual errors in a summary. Motivated by how humans inspect factual inconsistency in summaries, we propose an interpretable fine-grained inconsistency detection model, FineGrainFact.
arXiv Detail & Related papers (2023-05-23T22:11:47Z)
Towards Fine-Grained Information: Identifying the Type and Location of Translation Errors [80.22825549235556]
Existing approaches can not synchronously consider error position and type. We build an FG-TED model to predict the textbf addition and textbfomission errors. Experiments show that our model can identify both error type and position concurrently, and gives state-of-the-art results.
arXiv Detail & Related papers (2023-02-17T16:20:33Z)
Language model acceptability judgements are not always robust to context [30.868765627701457]
We investigate the stability of language models' performance on targeted syntactic evaluations. We find that model judgements are generally robust when placed in randomly sampled linguistic contexts. We show that these changes in model performance are not explainable by simple features matching the context and the test inputs.
arXiv Detail & Related papers (2022-12-18T00:11:06Z)
Does Your Model Classify Entities Reasonably? Diagnosing and Mitigating Spurious Correlations in Entity Typing [29.820473012776283]
Existing entity typing models are subject to the problem of spurious correlations. We identify six types of existing model biases, including mention-context bias, lexical overlapping bias, named entity bias, pronoun bias, dependency bias, and overgeneralization bias. By augmenting the original training set with their bias-free counterparts, models are forced to fully comprehend the sentences.
arXiv Detail & Related papers (2022-05-25T10:34:22Z)
Aggregating Pairwise Semantic Differences for Few-Shot Claim Veracity Classification [21.842139093124512]
We introduce SEED, a novel vector-based method to claim veracity classification. We build on the hypothesis that we can simulate class representative vectors that capture average semantic differences for claim-evidence pairs in a class. Experiments conducted on the FEVER and SCIFACT datasets show consistent improvements over competitive baselines in few-shot settings.
arXiv Detail & Related papers (2022-05-11T17:23:37Z)
Counterfactual Invariance to Spurious Correlations: Why and How to Pass Stress Tests [87.60900567941428]
A spurious correlation' is the dependence of a model on some aspect of the input data that an analyst thinks shouldn't matter. In machine learning, these have a know-it-when-you-see-it character. We study stress testing using the tools of causal inference.
arXiv Detail & Related papers (2021-05-31T14:39:38Z)
Unnatural Language Inference [48.45003475966808]
We find that state-of-the-art NLI models, such as RoBERTa and BART, are invariant to, and sometimes even perform better on, examples with randomly reordered words. Our findings call into question the idea that our natural language understanding models, and the tasks used for measuring their progress, genuinely require a human-like understanding of syntax.
arXiv Detail & Related papers (2020-12-30T20:40:48Z)
Adv-BERT: BERT is not robust on misspellings! Generating nature adversarial samples on BERT [95.88293021131035]
It is unclear, however, how the models will perform in realistic scenarios where textitnatural rather than malicious adversarial instances often exist. This work systematically explores the robustness of BERT, the state-of-the-art Transformer-style model in NLP, in dealing with noisy data.
arXiv Detail & Related papers (2020-02-27T22:07:11Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.