Greener yet Powerful: Taming Large Code Generation Models with
Quantization
- URL: http://arxiv.org/abs/2303.05378v1
- Date: Thu, 9 Mar 2023 16:25:51 GMT
- Title: Greener yet Powerful: Taming Large Code Generation Models with
Quantization
- Authors: Xiaokai Wei, Sujan Gonugondla, Wasi Ahmad, Shiqi Wang, Baishakhi Ray,
Haifeng Qian, Xiaopeng Li, Varun Kumar, Zijian Wang, Yuchen Tian, Qing Sun,
Ben Athiwaratkun, Mingyue Shang, Murali Krishna Ramanathan, Parminder Bhatia,
Bing Xiang
- Abstract summary: Large pretrained deep learning models have substantially pushed the boundary of code generation.
Despite their great power, the huge number of model parameters poses a significant threat to adapting them in a regular software development environment.
Model compression is a promising approach to address these challenges.
- Score: 47.734976584580224
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: ML-powered code generation aims to assist developers to write code in a more
productive manner, by intelligently generating code blocks based on natural
language prompts. Recently, large pretrained deep learning models have
substantially pushed the boundary of code generation and achieved impressive
performance. Despite their great power, the huge number of model parameters
poses a significant threat to adapting them in a regular software development
environment, where a developer might use a standard laptop or mid-size server
to develop her code. Such large models incur significant resource usage (in
terms of memory, latency, and dollars) as well as carbon footprint.
Model compression is a promising approach to address these challenges.
Several techniques are proposed to compress large pretrained models typically
used for vision or textual data. Out of many available compression techniques,
we identified that quantization is mostly applicable for code generation task
as it does not require significant retraining cost. As quantization represents
model parameters with lower-bit integer (e.g., int8), the model size and
runtime latency would both benefit from such int representation. We extensively
study the impact of quantized model on code generation tasks across different
dimension: (i) resource usage and carbon footprint, (ii) accuracy, and (iii)
robustness. To this end, through systematic experiments we find a recipe of
quantization technique that could run even a $6$B model in a regular laptop
without significant accuracy or robustness degradation. We further found the
recipe is readily applicable to code summarization task as well.
Related papers
- Does Your Neural Code Completion Model Use My Code? A Membership Inference Approach [66.51005288743153]
We investigate the legal and ethical issues of current neural code completion models.
We tailor a membership inference approach (termed CodeMI) that was originally crafted for classification tasks.
We evaluate the effectiveness of this adapted approach across a diverse array of neural code completion models.
arXiv Detail & Related papers (2024-04-22T15:54:53Z) - Scaling Laws Behind Code Understanding Model [4.846512516189021]
We study the scaling law for the code understanding task by varying training data, model size, and computing resource.
We train a large-scale code understanding model named CoLSBERT with 1.5B parameters on a large dataset using more computing resource, which outperforms previous work by a large margin.
arXiv Detail & Related papers (2024-02-20T08:31:42Z) - Model Compression and Efficient Inference for Large Language Models: A
Survey [20.199282252344396]
Large language models have two prominent characteristics compared to smaller models.
The most notable aspect of large models is the very high cost associated with model finetuning or training.
Large models emphasize versatility and generalization rather than performance on a single task.
arXiv Detail & Related papers (2024-02-15T06:58:30Z) - Code Representation Learning At Scale [75.04686476303436]
We fuel code representation learning with a vast amount of code data via a two-stage pretraining scheme.
We first train the encoders via a mix that leverages both randomness in masking language modeling and the structure aspect of programming language.
We then enhance the representations via contrastive learning with hard negative and hard positive constructed in an unsupervised manner.
arXiv Detail & Related papers (2024-02-02T22:19:15Z) - Neuron Patching: Semantic-based Neuron-level Language Model Repair for Code Generation [32.178931149612644]
ulModel ulImprovement via ulNeuron ulTargeting (textscMINT) is a novel approach for repairing code Language Models (LMs)
textscMINT is effective, efficient, and reliable, capable of correcting a neural model by patching a minimum number of neurons.
arXiv Detail & Related papers (2023-12-08T20:28:08Z) - Enriching Source Code with Contextual Data for Code Completion Models:
An Empirical Study [4.438873396405334]
We aim to answer whether making code easier to understand through using contextual data improves the performance of pre-trained code language models for the task of code completion.
For comments, we find that the models perform better in the presence of multi-line comments.
arXiv Detail & Related papers (2023-04-24T17:09:14Z) - Speculative Decoding with Big Little Decoder [108.95187338417541]
Big Little Decoder (BiLD) is a framework that can improve inference efficiency and latency for a wide range of text generation applications.
On an NVIDIA T4 GPU, our framework achieves a speedup of up to 2.12x speedup with minimal generation quality degradation.
Our framework is fully plug-and-play and can be applied without any modifications in the training process or model architecture.
arXiv Detail & Related papers (2023-02-15T18:55:29Z) - ReCode: Robustness Evaluation of Code Generation Models [90.10436771217243]
We propose ReCode, a comprehensive robustness evaluation benchmark for code generation models.
We customize over 30 transformations specifically for code on docstrings, function and variable names, code syntax, and code format.
With human annotators, we verified that over 90% of the perturbed prompts do not alter the semantic meaning of the original prompt.
arXiv Detail & Related papers (2022-12-20T14:11:31Z) - NatGen: Generative pre-training by "Naturalizing" source code [18.410818213965918]
We propose a new pre-training objective, "Naturalizing" of source code.
Unlike natural language, code's bimodal, dual-channel nature allows us to generate semantically equivalent code at scale.
We fine-tune our model in three generative Software Engineering tasks to achieve state-of-the-art performance rivaling CodeT5.
arXiv Detail & Related papers (2022-06-15T15:08:29Z) - Measuring Coding Challenge Competence With APPS [54.22600767666257]
We introduce APPS, a benchmark for code generation.
Our benchmark includes 10,000 problems, which range from having simple one-line solutions to being substantial algorithmic challenges.
Recent models such as GPT-Neo can pass approximately 15% of the test cases of introductory problems.
arXiv Detail & Related papers (2021-05-20T17:58:42Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.