Related papers: Greener yet Powerful: Taming Large Code Generation Models with Quantization

Greener yet Powerful: Taming Large Code Generation Models with Quantization

URL: http://arxiv.org/abs/2303.05378v1
Date: Thu, 9 Mar 2023 16:25:51 GMT
Title: Greener yet Powerful: Taming Large Code Generation Models with Quantization
Authors: Xiaokai Wei, Sujan Gonugondla, Wasi Ahmad, Shiqi Wang, Baishakhi Ray, Haifeng Qian, Xiaopeng Li, Varun Kumar, Zijian Wang, Yuchen Tian, Qing Sun, Ben Athiwaratkun, Mingyue Shang, Murali Krishna Ramanathan, Parminder Bhatia, Bing Xiang
Abstract summary: Large pretrained deep learning models have substantially pushed the boundary of code generation. Despite their great power, the huge number of model parameters poses a significant threat to adapting them in a regular software development environment. Model compression is a promising approach to address these challenges.
Score: 47.734976584580224
License: http://creativecommons.org/licenses/by/4.0/
Abstract: ML-powered code generation aims to assist developers to write code in a more productive manner, by intelligently generating code blocks based on natural language prompts. Recently, large pretrained deep learning models have substantially pushed the boundary of code generation and achieved impressive performance. Despite their great power, the huge number of model parameters poses a significant threat to adapting them in a regular software development environment, where a developer might use a standard laptop or mid-size server to develop her code. Such large models incur significant resource usage (in terms of memory, latency, and dollars) as well as carbon footprint. Model compression is a promising approach to address these challenges. Several techniques are proposed to compress large pretrained models typically used for vision or textual data. Out of many available compression techniques, we identified that quantization is mostly applicable for code generation task as it does not require significant retraining cost. As quantization represents model parameters with lower-bit integer (e.g., int8), the model size and runtime latency would both benefit from such int representation. We extensively study the impact of quantized model on code generation tasks across different dimension: (i) resource usage and carbon footprint, (ii) accuracy, and (iii) robustness. To this end, through systematic experiments we find a recipe of quantization technique that could run even a $6$B model in a regular laptop without significant accuracy or robustness degradation. We further found the recipe is readily applicable to code summarization task as well.

Related papers

RETROcode: Leveraging a Code Database for Improved Natural Language to Code Generation [10.19019476978683]
We present RETROcode, a novel adaptation of the RETRO architecture for sequence-to-sequence models. Our findings indicate that RETROcode not only outperforms similar-sized traditional architectures on test sets but also approaches the effectiveness of the much larger Codex model.
arXiv Detail & Related papers (2025-04-08T07:41:13Z)
Robust Learning of Diverse Code Edits [10.565439872488328]
Software engineering activities frequently involve edits to existing code. Code language models (LMs) lack the ability to handle diverse types of code-edit requirements.
arXiv Detail & Related papers (2025-03-05T16:39:04Z)
Resource-Efficient & Effective Code Summarization [3.512140256677132]
GreenAI techniques, such as QLoRA, offer a promising path for dealing with large models' sustainability. Our study evaluates two state-of-the-art CLMs across two programming languages: Python and Java. Results show that QLoRA enables efficient fine-tuning of CLMs for code summarization.
arXiv Detail & Related papers (2025-02-05T21:06:30Z)
Darkit: A User-Friendly Software Toolkit for Spiking Large Language Model [50.37090759139591]
Large language models (LLMs) have been widely applied in various practical applications, typically comprising billions of parameters. The human brain, employing bio-plausible spiking mechanisms, can accomplish the same tasks while significantly reducing energy consumption. We are releasing a software toolkit named DarwinKit (Darkit) to accelerate the adoption of brain-inspired large language models.
arXiv Detail & Related papers (2024-12-20T07:50:08Z)
Does Your Neural Code Completion Model Use My Code? A Membership Inference Approach [66.51005288743153]
We investigate the legal and ethical issues of current neural code completion models. We tailor a membership inference approach (termed CodeMI) that was originally crafted for classification tasks. We evaluate the effectiveness of this adapted approach across a diverse array of neural code completion models.
arXiv Detail & Related papers (2024-04-22T15:54:53Z)
Scaling Laws Behind Code Understanding Model [4.846512516189021]
We study the scaling law for the code understanding task by varying training data, model size, and computing resource. We train a large-scale code understanding model named CoLSBERT with 1.5B parameters on a large dataset using more computing resource, which outperforms previous work by a large margin.
arXiv Detail & Related papers (2024-02-20T08:31:42Z)
Model Compression and Efficient Inference for Large Language Models: A Survey [20.199282252344396]
Large language models have two prominent characteristics compared to smaller models. The most notable aspect of large models is the very high cost associated with model finetuning or training. Large models emphasize versatility and generalization rather than performance on a single task.
arXiv Detail & Related papers (2024-02-15T06:58:30Z)
Code Representation Learning At Scale [75.04686476303436]
We fuel code representation learning with a vast amount of code data via a two-stage pretraining scheme. We first train the encoders via a mix that leverages both randomness in masking language modeling and the structure aspect of programming language. We then enhance the representations via contrastive learning with hard negative and hard positive constructed in an unsupervised manner.
arXiv Detail & Related papers (2024-02-02T22:19:15Z)
Neuron Patching: Semantic-based Neuron-level Language Model Repair for Code Generation [32.178931149612644]
ulModel ulImprovement via ulNeuron ulTargeting (textscMINT) is a novel approach for repairing code Language Models (LMs) textscMINT is effective, efficient, and reliable, capable of correcting a neural model by patching a minimum number of neurons.
arXiv Detail & Related papers (2023-12-08T20:28:08Z)
Enriching Source Code with Contextual Data for Code Completion Models: An Empirical Study [4.438873396405334]
We aim to answer whether making code easier to understand through using contextual data improves the performance of pre-trained code language models for the task of code completion. For comments, we find that the models perform better in the presence of multi-line comments.
arXiv Detail & Related papers (2023-04-24T17:09:14Z)
Speculative Decoding with Big Little Decoder [108.95187338417541]
Big Little Decoder (BiLD) is a framework that can improve inference efficiency and latency for a wide range of text generation applications. On an NVIDIA T4 GPU, our framework achieves a speedup of up to 2.12x speedup with minimal generation quality degradation. Our framework is fully plug-and-play and can be applied without any modifications in the training process or model architecture.
arXiv Detail & Related papers (2023-02-15T18:55:29Z)
ReCode: Robustness Evaluation of Code Generation Models [90.10436771217243]
We propose ReCode, a comprehensive robustness evaluation benchmark for code generation models. We customize over 30 transformations specifically for code on docstrings, function and variable names, code syntax, and code format. With human annotators, we verified that over 90% of the perturbed prompts do not alter the semantic meaning of the original prompt.
arXiv Detail & Related papers (2022-12-20T14:11:31Z)
NatGen: Generative pre-training by "Naturalizing" source code [18.410818213965918]
We propose a new pre-training objective, "Naturalizing" of source code. Unlike natural language, code's bimodal, dual-channel nature allows us to generate semantically equivalent code at scale. We fine-tune our model in three generative Software Engineering tasks to achieve state-of-the-art performance rivaling CodeT5.
arXiv Detail & Related papers (2022-06-15T15:08:29Z)
Measuring Coding Challenge Competence With APPS [54.22600767666257]
We introduce APPS, a benchmark for code generation. Our benchmark includes 10,000 problems, which range from having simple one-line solutions to being substantial algorithmic challenges. Recent models such as GPT-Neo can pass approximately 15% of the test cases of introductory problems.
arXiv Detail & Related papers (2021-05-20T17:58:42Z)

This list is automatically generated from the titles and abstracts of the papers in this site.