The Cost of Compression: Investigating the Impact of Compression on
Parametric Knowledge in Language Models
- URL: http://arxiv.org/abs/2312.00960v1
- Date: Fri, 1 Dec 2023 22:27:12 GMT
- Title: The Cost of Compression: Investigating the Impact of Compression on
Parametric Knowledge in Language Models
- Authors: Satya Sai Srinath Namburi, Makesh Sreedhar, Srinath Srinivasan,
Frederic Sala
- Abstract summary: Large language models (LLMs) provide faster inference, smaller memory footprints, and enables local deployment.
Two standard compression techniques are pruning and quantization, with the former eliminating redundant connections in model layers and the latter representing model parameters with fewer bits.
Existing research on LLM compression primarily focuses on performance in terms of general metrics like perplexity or downstream task accuracy.
More fine-grained metrics, such as those measuring parametric knowledge, remain significantly underexplored.
- Score: 11.156816338995503
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Compressing large language models (LLMs), often consisting of billions of
parameters, provides faster inference, smaller memory footprints, and enables
local deployment. Two standard compression techniques are pruning and
quantization, with the former eliminating redundant connections in model layers
and the latter representing model parameters with fewer bits. The key tradeoff
is between the degree of compression and the impact on the quality of the
compressed model. Existing research on LLM compression primarily focuses on
performance in terms of general metrics like perplexity or downstream task
accuracy. More fine-grained metrics, such as those measuring parametric
knowledge, remain significantly underexplored. To help bridge this gap, we
present a comprehensive analysis across multiple model families (ENCODER,
ENCODER-DECODER, and DECODER) using the LAMA and LM-HARNESS benchmarks in order
to systematically quantify the effect of commonly employed compression
techniques on model performance. A particular focus is on tradeoffs involving
parametric knowledge, with the goal of providing practitioners with practical
insights to help make informed decisions on compression. We release our
codebase1 to enable further research.
Related papers
Err
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.