On the Benefits of Fine-Grained Loss Truncation: A Case Study on
Factuality in Summarization
- URL: http://arxiv.org/abs/2403.05788v1
- Date: Sat, 9 Mar 2024 04:20:26 GMT
- Title: On the Benefits of Fine-Grained Loss Truncation: A Case Study on
Factuality in Summarization
- Authors: Lorenzo Jaime Yu Flores, Arman Cohan
- Abstract summary: Loss Truncation (LT) is an approach to modify the standard log loss to adaptively remove noisy examples during training.
We show that LT alone yields a considerable number of hallucinated entities on various datasets.
We propose a fine-grained NLL loss and fine-grained data cleaning strategies, and observe improvements in hallucination reduction across some datasets.
- Score: 25.282499952331094
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Text summarization and simplification are among the most widely used
applications of AI. However, models developed for such tasks are often prone to
hallucination, which can result from training on unaligned data. One efficient
approach to address this issue is Loss Truncation (LT) (Kang and Hashimoto,
2020), an approach to modify the standard log loss to adaptively remove noisy
examples during training. However, we find that LT alone yields a considerable
number of hallucinated entities on various datasets. We study the behavior of
the underlying losses between factual and non-factual examples, to understand
and refine the performance of LT. We demonstrate that LT's performance is
limited when the underlying assumption that noisy targets have higher NLL loss
is not satisfied, and find that word-level NLL among entities provides better
signal for distinguishing factuality. We then leverage this to propose a
fine-grained NLL loss and fine-grained data cleaning strategies, and observe
improvements in hallucination reduction across some datasets. Our work is
available at https://https://github.com/yale-nlp/fine-grained-lt.
Related papers
Err
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.