Neural models for Factual Inconsistency Classification with Explanations
- URL: http://arxiv.org/abs/2306.08872v1
- Date: Thu, 15 Jun 2023 06:06:50 GMT
- Title: Neural models for Factual Inconsistency Classification with Explanations
- Authors: Tathagata Raha, Mukund Choudhary, Abhinav Menon, Harshit Gupta, KV
Aditya Srivatsa, Manish Gupta, Vasudeva Varma
- Abstract summary: We leverage existing work in linguistics to define five types of factual inconsistencies.
We train neural models to predict inconsistency type with explanations, given a (claim, context) sentence pair.
Our proposed methods provide a weighted F1 of 87% for inconsistency type classification across the five classes.
- Score: 17.214921274113284
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Factual consistency is one of the most important requirements when editing
high quality documents. It is extremely important for automatic text generation
systems like summarization, question answering, dialog modeling, and language
modeling. Still, automated factual inconsistency detection is rather
under-studied. Existing work has focused on (a) finding fake news keeping a
knowledge base in context, or (b) detecting broad contradiction (as part of
natural language inference literature). However, there has been no work on
detecting and explaining types of factual inconsistencies in text, without any
knowledge base in context. In this paper, we leverage existing work in
linguistics to formally define five types of factual inconsistencies. Based on
this categorization, we contribute a novel dataset, FICLE (Factual
Inconsistency CLassification with Explanation), with ~8K samples where each
sample consists of two sentences (claim and context) annotated with type and
span of inconsistency. When the inconsistency relates to an entity type, it is
labeled as well at two levels (coarse and fine-grained). Further, we leverage
this dataset to train a pipeline of four neural models to predict inconsistency
type with explanations, given a (claim, context) sentence pair. Explanations
include inconsistent claim fact triple, inconsistent context span, inconsistent
claim component, coarse and fine-grained inconsistent entity types. The
proposed system first predicts inconsistent spans from claim and context; and
then uses them to predict inconsistency types and inconsistent entity types
(when inconsistency is due to entities). We experiment with multiple
Transformer-based natural language classification as well as generative models,
and find that DeBERTa performs the best. Our proposed methods provide a
weighted F1 of ~87% for inconsistency type classification across the five
classes.
Related papers
- QUITE: Quantifying Uncertainty in Natural Language Text in Bayesian Reasoning Scenarios [15.193544498311603]
We present QUITE, a dataset of real-world Bayesian reasoning scenarios with categorical random variables and complex relationships.
We conduct an extensive set of experiments, finding that logic-based models outperform out-of-the-box large language models on all reasoning types.
Our results provide evidence that neuro-symbolic models are a promising direction for improving complex reasoning.
arXiv Detail & Related papers (2024-10-14T12:44:59Z) - Localizing Factual Inconsistencies in Attributable Text Generation [91.981439746404]
We introduce QASemConsistency, a new formalism for localizing factual inconsistencies in attributable text generation.
We first demonstrate the effectiveness of the QASemConsistency methodology for human annotation.
We then implement several methods for automatically detecting localized factual inconsistencies.
arXiv Detail & Related papers (2024-10-09T22:53:48Z) - Interpretable Automatic Fine-grained Inconsistency Detection in Text
Summarization [56.94741578760294]
We propose the task of fine-grained inconsistency detection, the goal of which is to predict the fine-grained types of factual errors in a summary.
Motivated by how humans inspect factual inconsistency in summaries, we propose an interpretable fine-grained inconsistency detection model, FineGrainFact.
arXiv Detail & Related papers (2023-05-23T22:11:47Z) - Towards Fine-Grained Information: Identifying the Type and Location of
Translation Errors [80.22825549235556]
Existing approaches can not synchronously consider error position and type.
We build an FG-TED model to predict the textbf addition and textbfomission errors.
Experiments show that our model can identify both error type and position concurrently, and gives state-of-the-art results.
arXiv Detail & Related papers (2023-02-17T16:20:33Z) - Language model acceptability judgements are not always robust to context [30.868765627701457]
We investigate the stability of language models' performance on targeted syntactic evaluations.
We find that model judgements are generally robust when placed in randomly sampled linguistic contexts.
We show that these changes in model performance are not explainable by simple features matching the context and the test inputs.
arXiv Detail & Related papers (2022-12-18T00:11:06Z) - Does Your Model Classify Entities Reasonably? Diagnosing and Mitigating
Spurious Correlations in Entity Typing [29.820473012776283]
Existing entity typing models are subject to the problem of spurious correlations.
We identify six types of existing model biases, including mention-context bias, lexical overlapping bias, named entity bias, pronoun bias, dependency bias, and overgeneralization bias.
By augmenting the original training set with their bias-free counterparts, models are forced to fully comprehend the sentences.
arXiv Detail & Related papers (2022-05-25T10:34:22Z) - Aggregating Pairwise Semantic Differences for Few-Shot Claim Veracity
Classification [21.842139093124512]
We introduce SEED, a novel vector-based method to claim veracity classification.
We build on the hypothesis that we can simulate class representative vectors that capture average semantic differences for claim-evidence pairs in a class.
Experiments conducted on the FEVER and SCIFACT datasets show consistent improvements over competitive baselines in few-shot settings.
arXiv Detail & Related papers (2022-05-11T17:23:37Z) - Counterfactual Invariance to Spurious Correlations: Why and How to Pass
Stress Tests [87.60900567941428]
A spurious correlation' is the dependence of a model on some aspect of the input data that an analyst thinks shouldn't matter.
In machine learning, these have a know-it-when-you-see-it character.
We study stress testing using the tools of causal inference.
arXiv Detail & Related papers (2021-05-31T14:39:38Z) - Unnatural Language Inference [48.45003475966808]
We find that state-of-the-art NLI models, such as RoBERTa and BART, are invariant to, and sometimes even perform better on, examples with randomly reordered words.
Our findings call into question the idea that our natural language understanding models, and the tasks used for measuring their progress, genuinely require a human-like understanding of syntax.
arXiv Detail & Related papers (2020-12-30T20:40:48Z) - Adv-BERT: BERT is not robust on misspellings! Generating nature
adversarial samples on BERT [95.88293021131035]
It is unclear, however, how the models will perform in realistic scenarios where textitnatural rather than malicious adversarial instances often exist.
This work systematically explores the robustness of BERT, the state-of-the-art Transformer-style model in NLP, in dealing with noisy data.
arXiv Detail & Related papers (2020-02-27T22:07:11Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.