On the Benefits of Fine-Grained Loss Truncation: A Case Study on
Factuality in Summarization
- URL: http://arxiv.org/abs/2403.05788v1
- Date: Sat, 9 Mar 2024 04:20:26 GMT
- Title: On the Benefits of Fine-Grained Loss Truncation: A Case Study on
Factuality in Summarization
- Authors: Lorenzo Jaime Yu Flores, Arman Cohan
- Abstract summary: Loss Truncation (LT) is an approach to modify the standard log loss to adaptively remove noisy examples during training.
We show that LT alone yields a considerable number of hallucinated entities on various datasets.
We propose a fine-grained NLL loss and fine-grained data cleaning strategies, and observe improvements in hallucination reduction across some datasets.
- Score: 25.282499952331094
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Text summarization and simplification are among the most widely used
applications of AI. However, models developed for such tasks are often prone to
hallucination, which can result from training on unaligned data. One efficient
approach to address this issue is Loss Truncation (LT) (Kang and Hashimoto,
2020), an approach to modify the standard log loss to adaptively remove noisy
examples during training. However, we find that LT alone yields a considerable
number of hallucinated entities on various datasets. We study the behavior of
the underlying losses between factual and non-factual examples, to understand
and refine the performance of LT. We demonstrate that LT's performance is
limited when the underlying assumption that noisy targets have higher NLL loss
is not satisfied, and find that word-level NLL among entities provides better
signal for distinguishing factuality. We then leverage this to propose a
fine-grained NLL loss and fine-grained data cleaning strategies, and observe
improvements in hallucination reduction across some datasets. Our work is
available at https://https://github.com/yale-nlp/fine-grained-lt.
Related papers
- Enhancing Unsupervised Sentence Embeddings via Knowledge-Driven Data Augmentation and Gaussian-Decayed Contrastive Learning [37.54523122932728]
We propose a pipeline-based data augmentation method via large language models (LLMs)
To tackle the issue of low data diversity, our pipeline utilizes knowledge graphs (KGs) to extract entities and quantities.
To address high data noise, the GCSE model uses a Gaussian-decayed function to limit the impact of false hard negative samples.
arXiv Detail & Related papers (2024-09-19T16:29:58Z) - Dynamics-Aware Loss for Learning with Label Noise [73.75129479936302]
Label noise poses a serious threat to deep neural networks (DNNs)
We propose a dynamics-aware loss (DAL) to solve this problem.
Both the detailed theoretical analyses and extensive experimental results demonstrate the superiority of our method.
arXiv Detail & Related papers (2023-03-21T03:05:21Z) - Identifying Hard Noise in Long-Tailed Sample Distribution [76.16113794808001]
We introduce Noisy Long-Tailed Classification (NLT)
Most de-noising methods fail to identify the hard noises.
We design an iterative noisy learning framework called Hard-to-Easy (H2E)
arXiv Detail & Related papers (2022-07-27T09:03:03Z) - Falsesum: Generating Document-level NLI Examples for Recognizing Factual
Inconsistency in Summarization [63.21819285337555]
We show that NLI models can be effective for this task when the training data is augmented with high-quality task-oriented examples.
We introduce Falsesum, a data generation pipeline leveraging a controllable text generation model to perturb human-annotated summaries.
We show that models trained on a Falsesum-augmented NLI dataset improve the state-of-the-art performance across four benchmarks for detecting factual inconsistency in summarization.
arXiv Detail & Related papers (2022-05-12T10:43:42Z) - L2B: Learning to Bootstrap Robust Models for Combating Label Noise [52.02335367411447]
This paper introduces a simple and effective method, named Learning to Bootstrap (L2B)
It enables models to bootstrap themselves using their own predictions without being adversely affected by erroneous pseudo-labels.
It achieves this by dynamically adjusting the importance weight between real observed and generated labels, as well as between different samples through meta-learning.
arXiv Detail & Related papers (2022-02-09T05:57:08Z) - Learning from Noisy Labels via Dynamic Loss Thresholding [69.61904305229446]
We propose a novel method named Dynamic Loss Thresholding (DLT)
During the training process, DLT records the loss value of each sample and calculates dynamic loss thresholds.
Experiments on CIFAR-10/100 and Clothing1M demonstrate substantial improvements over recent state-of-the-art methods.
arXiv Detail & Related papers (2021-04-01T07:59:03Z) - Temporal Calibrated Regularization for Robust Noisy Label Learning [60.90967240168525]
Deep neural networks (DNNs) exhibit great success on many tasks with the help of large-scale well annotated datasets.
However, labeling large-scale data can be very costly and error-prone so that it is difficult to guarantee the annotation quality.
We propose a Temporal Calibrated Regularization (TCR) in which we utilize the original labels and the predictions in the previous epoch together.
arXiv Detail & Related papers (2020-07-01T04:48:49Z) - Improved Natural Language Generation via Loss Truncation [29.676561106319173]
We show that distinguishability serves as a principled and robust alternative for handling invalid references.
We propose loss truncation, which adaptively removes high loss examples during training.
We show this is as easy to optimize as log loss and tightly bounds distinguishability under noise.
arXiv Detail & Related papers (2020-04-30T05:31:31Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.