Investigating Hallucinations in Pruned Large Language Models for
Abstractive Summarization
- URL: http://arxiv.org/abs/2311.09335v2
- Date: Mon, 29 Jan 2024 17:59:30 GMT
- Title: Investigating Hallucinations in Pruned Large Language Models for
Abstractive Summarization
- Authors: George Chrysostomou, Zhixue Zhao, Miles Williams, Nikolaos Aletras
- Abstract summary: Pruning is a technique that reduces model size by removing redundant weights, enabling more efficient sparse inference.
This paper provides an empirical study across five summarization datasets, two state-of-the-art pruning methods, and five instruction-tuned LLMs.
Surprisingly, we find that hallucinations from pruned LLMs are less prevalent than the original models.
- Score: 41.02676611256742
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Despite the remarkable performance of generative large language models (LLMs)
on abstractive summarization, they face two significant challenges: their
considerable size and tendency to hallucinate. Hallucinations are concerning
because they erode reliability and raise safety issues. Pruning is a technique
that reduces model size by removing redundant weights, enabling more efficient
sparse inference. Pruned models yield downstream task performance comparable to
the original, making them ideal alternatives when operating on a limited
budget. However, the effect that pruning has upon hallucinations in abstractive
summarization with LLMs has yet to be explored. In this paper, we provide an
extensive empirical study across five summarization datasets, two
state-of-the-art pruning methods, and five instruction-tuned LLMs.
Surprisingly, we find that hallucinations from pruned LLMs are less prevalent
than the original models. Our analysis suggests that pruned models tend to
depend more on the source document for summary generation. This leads to a
higher lexical overlap between the generated summary and the source document,
which could be a reason for the reduction in hallucination risk.
Related papers
- ANAH-v2: Scaling Analytical Hallucination Annotation of Large Language Models [65.12177400764506]
Large language models (LLMs) exhibit hallucinations in long-form question-answering tasks across various domains and wide applications.
Current hallucination detection and mitigation datasets are limited in domains and sizes.
This paper introduces an iterative self-training framework that simultaneously and progressively scales up the hallucination annotation dataset.
arXiv Detail & Related papers (2024-07-05T17:56:38Z) - Unfamiliar Finetuning Examples Control How Language Models Hallucinate [75.03210107477157]
Large language models are known to hallucinate when faced with unfamiliar queries.
We find that unfamiliar examples in the models' finetuning data are crucial in shaping these errors.
Our work further investigates RL finetuning strategies for improving the factuality of long-form model generations.
arXiv Detail & Related papers (2024-03-08T18:28:13Z) - Alleviating Hallucinations of Large Language Models through Induced
Hallucinations [67.35512483340837]
Large language models (LLMs) have been observed to generate responses that include inaccurate or fabricated information.
We propose a simple textitInduce-then-Contrast Decoding (ICD) strategy to alleviate hallucinations.
arXiv Detail & Related papers (2023-12-25T12:32:49Z) - Correction with Backtracking Reduces Hallucination in Summarization [30.827500697135118]
We introduce a simple yet efficient technique, CoBa, to reduce hallucination in abstractive summarization.
The approach is based on two steps: hallucination detection and mitigation.
The results show that CoBa is effective and efficient in reducing hallucination, and offers great adaptability and flexibility.
arXiv Detail & Related papers (2023-10-24T20:48:11Z) - Hallucination Reduction in Long Input Text Summarization [2.6745438139282283]
Hallucination in text summarization poses significant obstacles to the accuracy and reliability of the generated summaries.
We have incorporated the techniques of data filtering and joint entity and summary generation (JAENS) in the fine-tuning of the Longformer-Decoder (LED) model.
Our experiments show that the fine-tuned LED model performs well in generating the paper abstract.
arXiv Detail & Related papers (2023-09-28T18:22:16Z) - Summarization is (Almost) Dead [49.360752383801305]
We develop new datasets and conduct human evaluation experiments to evaluate the zero-shot generation capability of large language models (LLMs)
Our findings indicate a clear preference among human evaluators for LLM-generated summaries over human-written summaries and summaries generated by fine-tuned models.
arXiv Detail & Related papers (2023-09-18T08:13:01Z) - Detecting and Preventing Hallucinations in Large Vision Language Models [4.7264116948935975]
M-HalDetect is the first multi-modal hallucination detection dataset for detailed image descriptions.
We train fine-grained multi-modal reward models from InstructBLIP and evaluate their effectiveness with best-of-n rejection sampling.
We find that our reward model generalizes to other multi-modal models, reducing hallucinations in LLaVA and mPLUG-OWL by 15% and 57% respectively.
arXiv Detail & Related papers (2023-08-11T21:35:20Z) - Don't Say What You Don't Know: Improving the Consistency of Abstractive
Summarization by Constraining Beam Search [54.286450484332505]
We analyze the connection between hallucinations and training data, and find evidence that models hallucinate because they train on target summaries that are unsupported by the source.
We present PINOCCHIO, a new decoding method that improves the consistency of a transformer-based abstractive summarizer by constraining beam search to avoid hallucinations.
arXiv Detail & Related papers (2022-03-16T07:13:52Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.