Is Quantization a Deal-breaker? Empirical Insights from Large Code Models
- URL: http://arxiv.org/abs/2507.09665v1
- Date: Sun, 13 Jul 2025 14:58:19 GMT
- Title: Is Quantization a Deal-breaker? Empirical Insights from Large Code Models
- Authors: Saima Afrin, Bowen Xu, Antonio Mastropaolo,
- Abstract summary: We apply Activation-aware Weight Quantization (AWQ) to two widely used code models, CodeLlama and DeepSeekCoder, to generate Java and Python code.<n>Our findings reveal that quantization is a robust technique that not only preserves functional correctness, but also retains key qualitative code attributes sought after by developers.
- Score: 7.182449176083625
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: The growing scale of large language models (LLMs) not only demands extensive computational resources but also raises environmental concerns due to their increasing carbon footprint. Model quantization emerges as an effective approach that can reduce the resource demands of LLMs by decreasing parameter precision without substantially affecting performance (e.g., 16 bit to 4 bit). While recent studies have established quantization as a promising approach for optimizing large code models (LCMs), a specialized subset of LLMs tailored for automated software engineering, their findings offer only limited insights into its practical implications. Specifically, current investigations focus only on the functional correctness of the code generated by quantized models, neglecting how quantization impacts critical aspects of code quality such as reliability, maintainability, and security. To bridge this gap, our study investigates the effects of quantization on the qualitative aspects of automatically generated code. We apply Activation-aware Weight Quantization (AWQ) to two widely used code models, CodeLlama and DeepSeekCoder, to generate Java and Python code. Using state-of-the-art static analysis tools, we evaluate software quality metrics and static features including cyclomatic complexity, cognitive complexity, and lines of code. Our findings reveal that quantization is a robust technique that not only preserves functional correctness, but also retains key qualitative code attributes sought after by developers, such as maintainability and structural simplicity.
Related papers
- Training Language Models to Generate Quality Code with Program Analysis Feedback [66.0854002147103]
Code generation with large language models (LLMs) is increasingly adopted in production but fails to ensure code quality.<n>We propose REAL, a reinforcement learning framework that incentivizes LLMs to generate production-quality code.
arXiv Detail & Related papers (2025-05-28T17:57:47Z) - Is Compression Really Linear with Code Intelligence? [60.123628177110206]
textitFormat Annealing is a lightweight, transparent training methodology designed to assess the intrinsic capabilities of pre-trained models equitably.<n>Our empirical results reveal a fundamental logarithmic relationship between measured code intelligence and bits-per-character (BPC)<n>Our work provides a more nuanced understanding of compression's role in developing code intelligence and contributes a robust evaluation framework in the code domain.
arXiv Detail & Related papers (2025-05-16T16:59:14Z) - Quantizing Large Language Models for Code Generation: A Differentiated Replication [51.85505914274633]
Large Language Models (LLMs) have shown an impressive capability in code generation and, specifically, to automatically implement requirements described in natural language.<n>LLMs pose significant challenges related to their memory (and, consequently, carbon) footprint.<n>New frontier for LLM quantization is 4-bit precision, resulting in an average memory footprint reduction of 70%.
arXiv Detail & Related papers (2025-03-10T09:26:08Z) - Quantize What Counts: Bit Allocation Insights Informed by Spectral Gaps in Keys and Values [57.54443445583921]
We provide two novel theorems aimed at enhancing KV quantization methods.<n>Our first theorem, termed Key-Value Norm Disparity, states that the key weight matrices by nature carry richer information.<n>Our second theorem, Key-Driven Quantization, posits that prioritizing the quantization precision of keys over values induces significant improvements to the overall quantization performance.
arXiv Detail & Related papers (2025-02-20T22:24:27Z) - Resource-Efficient & Effective Code Summarization [3.512140256677132]
GreenAI techniques, such as QLoRA, offer a promising path for dealing with large models' sustainability.<n>Our study evaluates two state-of-the-art CLMs across two programming languages: Python and Java.<n>Results show that QLoRA enables efficient fine-tuning of CLMs for code summarization.
arXiv Detail & Related papers (2025-02-05T21:06:30Z) - Precision or Peril: Evaluating Code Quality from Quantized Large Language Models [0.5249805590164902]
Quantization has emerged as a way to mitigate the memory overhead of Large Language Models.
This study aims to evaluate the current code generation capabilities of smaller LLMs using various metrics.
arXiv Detail & Related papers (2024-11-16T01:31:29Z) - Quantum Machine Learning Architecture Search via Deep Reinforcement Learning [8.546707309430593]
We introduce deep reinforcement learning to explore proficient QML model architectures tailored for supervised learning tasks.
Our methodology involves training an RL agent to devise policies that facilitate the discovery of QML models without predetermined ansatz.
Our proposed method successfully identifies VQC architectures capable of achieving high classification accuracy while minimizing gate depth.
arXiv Detail & Related papers (2024-07-29T16:20:51Z) - LLMC: Benchmarking Large Language Model Quantization with a Versatile Compression Toolkit [55.73370804397226]
Quantization, a key compression technique, can effectively mitigate these demands by compressing and accelerating large language models.
We present LLMC, a plug-and-play compression toolkit, to fairly and systematically explore the impact of quantization.
Powered by this versatile toolkit, our benchmark covers three key aspects: calibration data, algorithms (three strategies), and data formats.
arXiv Detail & Related papers (2024-05-09T11:49:05Z) - QuantEase: Optimization-based Quantization for Language Models [17.333778751252392]
This work introduces Quantization (PTQ) of various quantization layers from recent advances of Large Language Models (LLMs)
Our CD-based approach features straightforward updates, relying solely on vector operations.
We also explore an outlier approach, allowing for retaining significant weights (outoutliers) with complete precision.
arXiv Detail & Related papers (2023-09-05T01:39:09Z) - CodeRL: Mastering Code Generation through Pretrained Models and Deep
Reinforcement Learning [92.36705236706678]
"CodeRL" is a new framework for program synthesis tasks through pretrained LMs and deep reinforcement learning.
During inference, we introduce a new generation procedure with a critical sampling strategy.
For the model backbones, we extended the encoder-decoder architecture of CodeT5 with enhanced learning objectives.
arXiv Detail & Related papers (2022-07-05T02:42:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.