Improved Natural Language Generation via Loss Truncation
- URL: http://arxiv.org/abs/2004.14589v2
- Date: Fri, 1 May 2020 02:22:04 GMT
- Title: Improved Natural Language Generation via Loss Truncation
- Authors: Daniel Kang and Tatsunori Hashimoto
- Abstract summary: We show that distinguishability serves as a principled and robust alternative for handling invalid references.
We propose loss truncation, which adaptively removes high loss examples during training.
We show this is as easy to optimize as log loss and tightly bounds distinguishability under noise.
- Score: 29.676561106319173
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Neural language models are usually trained to match the distributional
properties of a large-scale corpus by minimizing the log loss. While
straightforward to optimize, this approach forces the model to reproduce all
variations in the dataset, including noisy and invalid references (e.g.,
misannotation and hallucinated facts). Worse, the commonly used log loss is
overly sensitive to such phenomena and even a small fraction of noisy data can
degrade performance. In this work, we show that the distinguishability of the
models and reference serves as a principled and robust alternative for handling
invalid references. To optimize distinguishability, we propose loss truncation,
which adaptively removes high loss examples during training. We show this is as
easy to optimize as log loss and tightly bounds distinguishability under noise.
Empirically, we demonstrate that loss truncation outperforms existing baselines
on distinguishability on a summarization task, and show that samples generated
by the loss truncation model have factual accuracy ratings that exceed those of
baselines and match human references.
Related papers
- SINDER: Repairing the Singular Defects of DINOv2 [61.98878352956125]
Vision Transformer models trained on large-scale datasets often exhibit artifacts in the patch token they extract.
We propose a novel fine-tuning smooth regularization that rectifies structural deficiencies using only a small dataset.
arXiv Detail & Related papers (2024-07-23T20:34:23Z) - Late Stopping: Avoiding Confidently Learning from Mislabeled Examples [61.00103151680946]
We propose a new framework, Late Stopping, which leverages the intrinsic robust learning ability of DNNs through a prolonged training process.
We empirically observe that mislabeled and clean examples exhibit differences in the number of epochs required for them to be consistently and correctly classified.
Experimental results on benchmark-simulated and real-world noisy datasets demonstrate that the proposed method outperforms state-of-the-art counterparts.
arXiv Detail & Related papers (2023-08-26T12:43:25Z) - Boosting Differentiable Causal Discovery via Adaptive Sample Reweighting [62.23057729112182]
Differentiable score-based causal discovery methods learn a directed acyclic graph from observational data.
We propose a model-agnostic framework to boost causal discovery performance by dynamically learning the adaptive weights for the Reweighted Score function, ReScore.
arXiv Detail & Related papers (2023-03-06T14:49:59Z) - Improving the Robustness of Summarization Models by Detecting and
Removing Input Noise [50.27105057899601]
We present a large empirical study quantifying the sometimes severe loss in performance from different types of input noise for a range of datasets and model sizes.
We propose a light-weight method for detecting and removing such noise in the input during model inference without requiring any training, auxiliary models, or even prior knowledge of the type of noise.
arXiv Detail & Related papers (2022-12-20T00:33:11Z) - Understanding Factual Errors in Summarization: Errors, Summarizers,
Datasets, Error Detectors [105.12462629663757]
In this work, we aggregate factuality error annotations from nine existing datasets and stratify them according to the underlying summarization model.
We compare performance of state-of-the-art factuality metrics, including recent ChatGPT-based metrics, on this stratified benchmark and show that their performance varies significantly across different types of summarization models.
arXiv Detail & Related papers (2022-05-25T15:26:48Z) - Robust Training under Label Noise by Over-parameterization [41.03008228953627]
We propose a principled approach for robust training of over-parameterized deep networks in classification tasks where a proportion of training labels are corrupted.
The main idea is yet very simple: label noise is sparse and incoherent with the network learned from clean data, so we model the noise and learn to separate it from the data.
Remarkably, when trained using such a simple method in practice, we demonstrate state-of-the-art test accuracy against label noise on a variety of real datasets.
arXiv Detail & Related papers (2022-02-28T18:50:10Z) - The perils of being unhinged: On the accuracy of classifiers minimizing
a noise-robust convex loss [12.132641563193584]
van Rooyen et al. introduced a notion of convex loss functions being robust to random classification noise, and established that the "unhinged" loss function is robust in this sense.
In this note we study the accuracy of binary classifiers obtained by minimizing the unhinged loss, and observe that even for simple linearly separable data distributions, minimizing the unhinged loss may only yield a binary classifier with accuracy no better than random guessing.
arXiv Detail & Related papers (2021-12-08T20:57:20Z) - Label-Descriptive Patterns and their Application to Characterizing
Classification Errors [31.272875287136426]
State-of-the-art deep learning methods achieve human-like performance on many tasks, but make errors nevertheless.
Characterizing these errors in easily interpretable terms gives insight into whether a model is prone to making systematic errors, but also gives a way to act and improve the model.
In this paper we propose a method that allows us to do so for arbitrary classifiers by mining a small set of patterns that together succinctly describe the input data that is partitioned according to correctness of prediction.
arXiv Detail & Related papers (2021-10-18T19:42:21Z) - Discriminatively-Tuned Generative Classifiers for Robust Natural
Language Inference [59.62779187457773]
We propose a generative classifier for natural language inference (NLI)
We compare it to five baselines, including discriminative models and large-scale pretrained language representation models like BERT.
Experiments show that GenNLI outperforms both discriminative and pretrained baselines across several challenging NLI experimental settings.
arXiv Detail & Related papers (2020-10-08T04:44:00Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.