With a Little Push, NLI Models can Robustly and Efficiently Predict
Faithfulness
- URL: http://arxiv.org/abs/2305.16819v1
- Date: Fri, 26 May 2023 11:00:04 GMT
- Title: With a Little Push, NLI Models can Robustly and Efficiently Predict
Faithfulness
- Authors: Julius Steen, Juri Opitz, Anette Frank, Katja Markert
- Abstract summary: Conditional language models still generate unfaithful output that is not supported by their input.
We show that pure NLI models can outperform more complex metrics when combining task-adaptive data augmentation with robust inference procedures.
- Score: 19.79160738554967
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Conditional language models still generate unfaithful output that is not
supported by their input. These unfaithful generations jeopardize trust in
real-world applications such as summarization or human-machine interaction,
motivating a need for automatic faithfulness metrics. To implement such
metrics, NLI models seem attractive, since they solve a strongly related task
that comes with a wealth of prior research and data. But recent research
suggests that NLI models require costly additional machinery to perform
reliably across datasets, e.g., by running inference on a cartesian product of
input and generated sentences, or supporting them with a
question-generation/answering step.
In this work we show that pure NLI models _can_ outperform more complex
metrics when combining task-adaptive data augmentation with robust inference
procedures. We propose: (1) Augmenting NLI training data to adapt NL inferences
to the specificities of faithfulness prediction in dialogue; (2) Making use of
both entailment and contradiction probabilities in NLI, and (3) Using
Monte-Carlo dropout during inference. Applied to the TRUE benchmark, which
combines faithfulness datasets across diverse domains and tasks, our approach
strongly improves a vanilla NLI model and significantly outperforms previous
work, while showing favourable computational cost.
Related papers
- Forewarned is Forearmed: Leveraging LLMs for Data Synthesis through Failure-Inducing Exploration [90.41908331897639]
Large language models (LLMs) have significantly benefited from training on diverse, high-quality task-specific data.
We present a novel approach, ReverseGen, designed to automatically generate effective training samples.
arXiv Detail & Related papers (2024-10-22T06:43:28Z) - SELF-GUIDE: Better Task-Specific Instruction Following via Self-Synthetic Finetuning [70.21358720599821]
Large language models (LLMs) hold the promise of solving diverse tasks when provided with appropriate natural language prompts.
We propose SELF-GUIDE, a multi-stage mechanism in which we synthesize task-specific input-output pairs from the student LLM.
We report an absolute improvement of approximately 15% for classification tasks and 18% for generation tasks in the benchmark's metrics.
arXiv Detail & Related papers (2024-07-16T04:41:58Z) - Uncertainty-aware Parameter-Efficient Self-training for Semi-supervised
Language Understanding [38.11411155621616]
We study self-training as one of the predominant semi-supervised learning approaches.
We present UPET, a novel Uncertainty-aware self-Training framework.
We show that UPET achieves a substantial improvement in terms of performance and efficiency.
arXiv Detail & Related papers (2023-10-19T02:18:29Z) - Self-augmented Data Selection for Few-shot Dialogue Generation [18.794770678708637]
We adopt the self-training framework to deal with the few-shot MR-to-Text generation problem.
We propose a novel data selection strategy to select the data that our generation model is most uncertain about.
arXiv Detail & Related papers (2022-05-19T16:25:50Z) - Falsesum: Generating Document-level NLI Examples for Recognizing Factual
Inconsistency in Summarization [63.21819285337555]
We show that NLI models can be effective for this task when the training data is augmented with high-quality task-oriented examples.
We introduce Falsesum, a data generation pipeline leveraging a controllable text generation model to perturb human-annotated summaries.
We show that models trained on a Falsesum-augmented NLI dataset improve the state-of-the-art performance across four benchmarks for detecting factual inconsistency in summarization.
arXiv Detail & Related papers (2022-05-12T10:43:42Z) - Stretching Sentence-pair NLI Models to Reason over Long Documents and
Clusters [35.103851212995046]
Natural Language Inference (NLI) has been extensively studied by the NLP community as a framework for estimating the semantic relation between sentence pairs.
We explore the direct zero-shot applicability of NLI models to real applications, beyond the sentence-pair setting they were trained on.
We develop new aggregation methods to allow operating over full documents, reaching state-of-the-art performance on the ContractNLI dataset.
arXiv Detail & Related papers (2022-04-15T12:56:39Z) - Efficient Nearest Neighbor Language Models [114.40866461741795]
Non-parametric neural language models (NLMs) learn predictive distributions of text utilizing an external datastore.
We show how to achieve up to a 6x speed-up in inference speed while retaining comparable performance.
arXiv Detail & Related papers (2021-09-09T12:32:28Z) - Contrastive Self-supervised Sequential Recommendation with Robust
Augmentation [101.25762166231904]
Sequential Recommendationdescribes a set of techniques to model dynamic user behavior in order to predict future interactions in sequential user data.
Old and new issues remain, including data-sparsity and noisy data.
We propose Contrastive Self-Supervised Learning for sequential Recommendation (CoSeRec)
arXiv Detail & Related papers (2021-08-14T07:15:25Z) - Generative Adversarial Networks for Annotated Data Augmentation in Data
Sparse NLU [0.76146285961466]
Data sparsity is one of the key challenges associated with model development in Natural Language Understanding.
We present our results on boosting NLU model performance through training data augmentation using a sequential generative adversarial network (GAN)
Our experiments reveal synthetic data generated using the sequential generative adversarial network provides significant performance boosts across multiple metrics.
arXiv Detail & Related papers (2020-12-09T20:38:17Z) - Looking Beyond Sentence-Level Natural Language Inference for Downstream
Tasks [15.624486319943015]
In recent years, the Natural Language Inference (NLI) task has garnered significant attention.
We study this unfulfilled promise from the lens of two downstream tasks: question answering (QA), and text summarization.
We conjecture that a key difference between the NLI datasets and these downstream tasks concerns the length of the premise.
arXiv Detail & Related papers (2020-09-18T21:44:35Z) - Meta-Learned Confidence for Few-shot Learning [60.6086305523402]
A popular transductive inference technique for few-shot metric-based approaches, is to update the prototype of each class with the mean of the most confident query examples.
We propose to meta-learn the confidence for each query sample, to assign optimal weights to unlabeled queries.
We validate our few-shot learning model with meta-learned confidence on four benchmark datasets.
arXiv Detail & Related papers (2020-02-27T10:22:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.