WeCheck: Strong Factual Consistency Checker via Weakly Supervised
Learning
- URL: http://arxiv.org/abs/2212.10057v2
- Date: Sat, 27 May 2023 13:42:23 GMT
- Title: WeCheck: Strong Factual Consistency Checker via Weakly Supervised
Learning
- Authors: Wenhao Wu, Wei Li, Xinyan Xiao, Jiachen Liu, Sujian Li, Yajuan Lv
- Abstract summary: We propose a weakly supervised framework that aggregates multiple resources to train a precise and efficient factual metric, namely WeCheck.
Comprehensive experiments on a variety of tasks demonstrate the strong performance of WeCheck, which achieves a 3.4% absolute improvement over previous state-of-the-art methods on TRUE benchmark on average.
- Score: 40.5830891229718
- License: http://creativecommons.org/publicdomain/zero/1.0/
- Abstract: A crucial issue of current text generation models is that they often
uncontrollably generate factually inconsistent text with respective of their
inputs. Limited by the lack of annotated data, existing works in evaluating
factual consistency directly transfer the reasoning ability of models trained
on other data-rich upstream tasks like question answering (QA) and natural
language inference (NLI) without any further adaptation. As a result, they
perform poorly on the real generated text and are biased heavily by their
single-source upstream tasks. To alleviate this problem, we propose a weakly
supervised framework that aggregates multiple resources to train a precise and
efficient factual metric, namely WeCheck. WeCheck first utilizes a generative
model to accurately label a real generated sample by aggregating its weak
labels, which are inferred from multiple resources. Then, we train the target
metric model with the weak supervision while taking noises into consideration.
Comprehensive experiments on a variety of tasks demonstrate the strong
performance of WeCheck, which achieves a 3.4\% absolute improvement over
previous state-of-the-art methods on TRUE benchmark on average.
Related papers
- Localizing Factual Inconsistencies in Attributable Text Generation [91.981439746404]
We introduce QASemConsistency, a new formalism for localizing factual inconsistencies in attributable text generation.
We first demonstrate the effectiveness of the QASemConsistency methodology for human annotation.
We then implement several methods for automatically detecting localized factual inconsistencies.
arXiv Detail & Related papers (2024-10-09T22:53:48Z) - Detecting Errors through Ensembling Prompts (DEEP): An End-to-End LLM Framework for Detecting Factual Errors [11.07539342949602]
We propose an end-to-end framework for detecting factual errors in text summarization.
Our framework uses a diverse set of LLM prompts to identify factual inconsistencies.
We calibrate the ensembled models to produce empirically accurate probabilities that a text is factually consistent or free of hallucination.
arXiv Detail & Related papers (2024-06-18T18:59:37Z) - AMRFact: Enhancing Summarization Factuality Evaluation with AMR-Driven Negative Samples Generation [57.8363998797433]
We propose AMRFact, a framework that generates perturbed summaries using Abstract Meaning Representations (AMRs)
Our approach parses factually consistent summaries into AMR graphs and injects controlled factual inconsistencies to create negative examples, allowing for coherent factually inconsistent summaries to be generated with high error-type coverage.
arXiv Detail & Related papers (2023-11-16T02:56:29Z) - Preserving Knowledge Invariance: Rethinking Robustness Evaluation of
Open Information Extraction [50.62245481416744]
We present the first benchmark that simulates the evaluation of open information extraction models in the real world.
We design and annotate a large-scale testbed in which each example is a knowledge-invariant clique.
By further elaborating the robustness metric, a model is judged to be robust if its performance is consistently accurate on the overall cliques.
arXiv Detail & Related papers (2023-05-23T12:05:09Z) - FAST: Improving Controllability for Text Generation with Feedback Aware
Self-Training [25.75982440355576]
Controllable text generation systems often leverage control codes to direct various properties of the output like style and length.
Inspired by recent work on causal inference for NLP, this paper reveals a previously overlooked flaw in these control code-based conditional text generation algorithms.
We propose two simple techniques to reduce these correlations in training sets.
arXiv Detail & Related papers (2022-10-06T19:00:51Z) - Falsesum: Generating Document-level NLI Examples for Recognizing Factual
Inconsistency in Summarization [63.21819285337555]
We show that NLI models can be effective for this task when the training data is augmented with high-quality task-oriented examples.
We introduce Falsesum, a data generation pipeline leveraging a controllable text generation model to perturb human-annotated summaries.
We show that models trained on a Falsesum-augmented NLI dataset improve the state-of-the-art performance across four benchmarks for detecting factual inconsistency in summarization.
arXiv Detail & Related papers (2022-05-12T10:43:42Z) - Masked Summarization to Generate Factually Inconsistent Summaries for
Improved Factual Consistency Checking [28.66287193703365]
We propose to generate factually inconsistent summaries using source texts and reference summaries with key information masked.
Experiments on seven benchmark datasets demonstrate that factual consistency classifiers trained on summaries generated using our method generally outperform existing models.
arXiv Detail & Related papers (2022-05-04T12:48:49Z) - NLI Data Sanity Check: Assessing the Effect of Data Corruption on Model
Performance [3.7024660695776066]
We propose a new diagnostics test suite which allows to assess whether a dataset constitutes a good testbed for evaluating the models' meaning understanding capabilities.
We specifically apply controlled corruption transformations to widely used benchmarks (MNLI and ANLI)
A large decrease in model accuracy indicates that the original dataset provides a proper challenge to the models' reasoning capabilities.
arXiv Detail & Related papers (2021-04-10T12:28:07Z) - Meta-Learned Confidence for Few-shot Learning [60.6086305523402]
A popular transductive inference technique for few-shot metric-based approaches, is to update the prototype of each class with the mean of the most confident query examples.
We propose to meta-learn the confidence for each query sample, to assign optimal weights to unlabeled queries.
We validate our few-shot learning model with meta-learned confidence on four benchmark datasets.
arXiv Detail & Related papers (2020-02-27T10:22:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.