Do Emergent Abilities Exist in Quantized Large Language Models: An
Empirical Study
- URL: http://arxiv.org/abs/2307.08072v2
- Date: Wed, 26 Jul 2023 04:15:48 GMT
- Title: Do Emergent Abilities Exist in Quantized Large Language Models: An
Empirical Study
- Authors: Peiyu Liu, Zikang Liu, Ze-Feng Gao, Dawei Gao, Wayne Xin Zhao, Yaliang
Li, Bolin Ding, Ji-Rong Wen
- Abstract summary: This work aims to investigate the impact of quantization on emphemergent abilities, which are important characteristics that distinguish LLMs from small language models.
Our empirical experiments show that these emergent abilities still exist in 4-bit quantization models, while 2-bit models encounter severe performance degradation.
To improve the performance of low-bit models, we conduct two special experiments: (1) fine-gained impact analysis that studies which components (or substructures) are more sensitive to quantization, and (2) performance compensation through model fine-tuning.
- Score: 90.34226812493083
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Despite the superior performance, Large Language Models~(LLMs) require
significant computational resources for deployment and use. To overcome this
issue, quantization methods have been widely applied to reduce the memory
footprint of LLMs as well as increasing the inference rate. However, a major
challenge is that low-bit quantization methods often lead to performance
degradation. It is important to understand how quantization impacts the
capacity of LLMs. Different from previous studies focused on overall
performance, this work aims to investigate the impact of quantization on
\emph{emergent abilities}, which are important characteristics that distinguish
LLMs from small language models. Specially, we examine the abilities of
in-context learning, chain-of-thought reasoning, and instruction-following in
quantized LLMs. Our empirical experiments show that these emergent abilities
still exist in 4-bit quantization models, while 2-bit models encounter severe
performance degradation on the test of these abilities. To improve the
performance of low-bit models, we conduct two special experiments: (1)
fine-gained impact analysis that studies which components (or substructures)
are more sensitive to quantization, and (2) performance compensation through
model fine-tuning. Our work derives a series of important findings to
understand the impact of quantization on emergent abilities, and sheds lights
on the possibilities of extremely low-bit quantization for LLMs.
Related papers
- Learn from Downstream and Be Yourself in Multimodal Large Language Model Fine-Tuning [104.27224674122313]
Fine-tuning MLLM has become a common practice to improve performance on specific downstream tasks.
To balance the trade-off between generalization and specialization, we propose measuring the parameter importance for both pre-trained and fine-tuning distributions.
arXiv Detail & Related papers (2024-11-17T01:16:37Z) - Scaling laws for post-training quantized large language models [41.78467383320145]
Generalization abilities of well-trained large language models (LLMs) are known to scale predictably as a function of model size.
The quality of LLMs after post-training compression remains highly unpredictable, often requiring case-by-case validation in practice.
arXiv Detail & Related papers (2024-10-15T23:34:22Z) - Advancing Multimodal Large Language Models with Quantization-Aware Scale Learning for Efficient Adaptation [70.22782550540714]
Quantization-aware Scale LeArning method based on multimodal Warmup, termed QSLAW.
We introduce a Quantization-aware Scale LeArning method based on multimodal Warmup, termed QSLAW.
arXiv Detail & Related papers (2024-08-07T12:42:09Z) - Compensate Quantization Errors+: Quantized Models Are Inquisitive Learners [17.43650511873449]
Large Language Models (LLMs) showcase remarkable performance and robust deductive capabilities, yet their expansive size complicates deployment and raises environmental concerns due to substantial resource consumption.
We have developed innovative methods that enhance the performance of quantized LLMs, particularly in low-bit settings.
Our methods consistently deliver state-of-the-art results across various quantization scenarios and offer deep theoretical insights into the quantization process, elucidating the potential of quantized models for widespread application.
arXiv Detail & Related papers (2024-07-22T09:45:16Z) - MR-Ben: A Meta-Reasoning Benchmark for Evaluating System-2 Thinking in LLMs [55.20845457594977]
Large language models (LLMs) have shown increasing capability in problem-solving and decision-making.
We present a process-based benchmark MR-Ben that demands a meta-reasoning skill.
Our meta-reasoning paradigm is especially suited for system-2 slow thinking.
arXiv Detail & Related papers (2024-06-20T03:50:23Z) - When Quantization Affects Confidence of Large Language Models? [4.338589334157708]
We show that GPTQ to 4-bit results in a decrease in confidence regarding true labels, with varying impacts observed among different language models.
We propose an explanation for quantization loss based on confidence levels, indicating that quantization disproportionately affects samples where the full model exhibited low confidence levels in the first place.
arXiv Detail & Related papers (2024-05-01T16:58:28Z) - What Makes Quantization for Large Language Models Hard? An Empirical
Study from the Lens of Perturbation [55.153595212571375]
Quantization is a technique for improving the memory and computational efficiency of large language models (LLMs)
We propose a new perspective on quantization, viewing it as perturbations added to the weights and activations of LLMs.
We conduct experiments with various artificial perturbations to explore their impact on LLM performance.
arXiv Detail & Related papers (2024-03-11T03:42:51Z) - A Comprehensive Evaluation of Quantization Strategies for Large Language Models [42.03804933928227]
Increasing the number of parameters in large language models (LLMs) usually improves performance in downstream tasks but raises compute and memory costs.
Quantization techniques, which reduce the bits needed for model weights or activations with minimal performance loss, have become popular.
We propose a structured evaluation framework consisting of three critical dimensions: knowledge & capacity, (2) alignment, and (3) efficiency.
arXiv Detail & Related papers (2024-02-26T17:45:36Z) - Beyond Task Performance: Evaluating and Reducing the Flaws of Large
Multimodal Models with In-Context Learning [105.77733287326308]
We evaluate 10 recent open-source LMMs from 3B up to 80B parameter scale, on 5 different axes; hallucinations, abstention, compositionality, explainability and instruction following.
We explore the training-free in-context learning (ICL) as a solution, and study how it affects these limitations.
Based on our ICL study, (3) we push ICL further and propose new multimodal ICL variants such as; Multitask-ICL, Chain-of-Hindsight-ICL, and Self-Correcting-ICL.
arXiv Detail & Related papers (2023-10-01T12:02:59Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.