Do Compressed LLMs Forget Knowledge? An Experimental Study with
Practical Implications
- URL: http://arxiv.org/abs/2310.00867v3
- Date: Fri, 16 Feb 2024 18:39:45 GMT
- Title: Do Compressed LLMs Forget Knowledge? An Experimental Study with
Practical Implications
- Authors: Duc N.M Hoang, Minsik Cho, Thomas Merth, Mohammad Rastegari, Zhangyang
Wang
- Abstract summary: Large Language Models (LLMs) often leads to reduced performance, especially for knowledge-intensive tasks.
We propose two conjectures on the nature of the damage: one is certain knowledge being forgotten (or erased) after compression.
We introduce a variant called Inference-time Dynamic Prompting (IDP) that can effectively increase prompt diversity without incurring any inference overhead.
- Score: 63.29358103217275
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Compressing Large Language Models (LLMs) often leads to reduced performance,
especially for knowledge-intensive tasks. In this work, we dive into how
compression damages LLMs' inherent knowledge and the possible remedies. We
start by proposing two conjectures on the nature of the damage: one is certain
knowledge being forgotten (or erased) after LLM compression, hence
necessitating the compressed model to (re)learn from data with additional
parameters; the other presumes that knowledge is internally displaced and hence
one requires merely "inference re-direction" with input-side augmentation such
as prompting, to recover the knowledge-related performance. Extensive
experiments are then designed to (in)validate the two conjectures. We observe
the promise of prompting in comparison to model tuning; we further unlock
prompting's potential by introducing a variant called Inference-time Dynamic
Prompting (IDP), that can effectively increase prompt diversity without
incurring any inference overhead. Our experiments consistently suggest that
compared to the classical re-training alternatives such as LoRA, prompting with
IDP leads to better or comparable post-compression performance recovery, while
saving the extra parameter size by 21x and reducing inference latency by 60%.
Our experiments hence strongly endorse the conjecture of "knowledge displaced"
over "knowledge forgotten", and shed light on a new efficient mechanism to
restore compressed LLM performance. We additionally visualize and analyze the
different attention and activation patterns between prompted and re-trained
models, demonstrating they achieve performance recovery in two different
regimes.
Related papers
- Know the Unknown: An Uncertainty-Sensitive Method for LLM Instruction Tuning [18.283963879468466]
Large language models (LLMs) have demonstrated remarkable capabilities across various tasks but still face challenges such as hallucinations.
We propose a novel approach called uncertainty-sensitive tuning to improve models' capability to recognize the boundaries of their knowledge.
We show that our proposed uncertainty-sensitive tuning method significantly improves the performance of the Llama2-chat-7B model.
arXiv Detail & Related papers (2024-06-14T14:56:04Z) - ReST meets ReAct: Self-Improvement for Multi-Step Reasoning LLM Agent [50.508669199496474]
We develop a ReAct-style LLM agent with the ability to reason and act upon external knowledge.
We refine the agent through a ReST-like method that iteratively trains on previous trajectories.
Starting from a prompted large model and after just two iterations of the algorithm, we can produce a fine-tuned small model.
arXiv Detail & Related papers (2023-12-15T18:20:15Z) - The Cost of Compression: Investigating the Impact of Compression on
Parametric Knowledge in Language Models [11.156816338995503]
Large language models (LLMs) provide faster inference, smaller memory footprints, and enables local deployment.
Two standard compression techniques are pruning and quantization, with the former eliminating redundant connections in model layers and the latter representing model parameters with fewer bits.
Existing research on LLM compression primarily focuses on performance in terms of general metrics like perplexity or downstream task accuracy.
More fine-grained metrics, such as those measuring parametric knowledge, remain significantly underexplored.
arXiv Detail & Related papers (2023-12-01T22:27:12Z) - R-Tuning: Instructing Large Language Models to Say `I Don't Know' [66.11375475253007]
Large language models (LLMs) have revolutionized numerous domains with their impressive performance but still face their challenges.
Previous instruction tuning methods force the model to complete a sentence no matter whether the model knows the knowledge or not.
We present a new approach called Refusal-Aware Instruction Tuning (R-Tuning)
Experimental results demonstrate R-Tuning effectively improves a model's ability to answer known questions and refrain from answering unknown questions.
arXiv Detail & Related papers (2023-11-16T08:45:44Z) - Forgetting before Learning: Utilizing Parametric Arithmetic for
Knowledge Updating in Large Language Models [53.52344131257681]
We propose a new paradigm for fine-tuning called F-Learning, which employs parametric arithmetic to facilitate the forgetting of old knowledge and learning of new knowledge.
Experimental results on two publicly available datasets demonstrate that our proposed F-Learning can obviously improve the knowledge updating performance of both full fine-tuning and LoRA fine-tuning.
arXiv Detail & Related papers (2023-11-14T09:12:40Z) - PREFER: Prompt Ensemble Learning via Feedback-Reflect-Refine [24.888093229577965]
We propose a simple, universal, and automatic method named PREFER to address the stated limitations.
Our PREFER achieves state-of-the-art performance in multiple types of tasks by a significant margin.
arXiv Detail & Related papers (2023-08-23T09:46:37Z) - Compress, Then Prompt: Improving Accuracy-Efficiency Trade-off of LLM
Inference with Transferable Prompt [96.24800696597707]
We introduce a new perspective to optimize this trade-off by prompting compressed models.
We propose a soft prompt learning method where we expose the compressed model to the prompt learning process.
Our experimental analysis suggests our soft prompt strategy greatly improves the performance of the 8x compressed LLaMA-7B model.
arXiv Detail & Related papers (2023-05-17T20:45:13Z) - Temporal Difference Learning with Compressed Updates: Error-Feedback meets Reinforcement Learning [47.904127007515925]
We study a variant of the classical temporal difference (TD) learning algorithm with a perturbed update direction.
We prove that compressed TD algorithms, coupled with an error-feedback mechanism used widely in optimization, exhibit the same non-asymptotic approximation guarantees as their counterparts.
Notably, these are the first finite-time results in RL that account for general compression operators and error-feedback in tandem with linear function approximation and Markovian sampling.
arXiv Detail & Related papers (2023-01-03T04:09:38Z) - Understanding Self-supervised Learning with Dual Deep Networks [74.92916579635336]
We propose a novel framework to understand contrastive self-supervised learning (SSL) methods that employ dual pairs of deep ReLU networks.
We prove that in each SGD update of SimCLR with various loss functions, the weights at each layer are updated by a emphcovariance operator.
To further study what role the covariance operator plays and which features are learned in such a process, we model data generation and augmentation processes through a emphhierarchical latent tree model (HLTM)
arXiv Detail & Related papers (2020-10-01T17:51:49Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.