Greening Large Language Models of Code
- URL: http://arxiv.org/abs/2309.04076v3
- Date: Fri, 12 Jan 2024 02:17:47 GMT
- Title: Greening Large Language Models of Code
- Authors: Jieke Shi, Zhou Yang, Hong Jin Kang, Bowen Xu, Junda He, David Lo
- Abstract summary: Avatar is a novel approach that crafts a deployable model from a large language model of code.
The key idea of Avatar is to formulate the optimization of language models as a multi-objective configuration tuning problem.
We use Avatar to produce optimized models with a small size (3 MB), which is 160$times$ smaller than the original large models.
- Score: 13.840108405182407
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Large language models of code have shown remarkable effectiveness across
various software engineering tasks. Despite the availability of many cloud
services built upon these powerful models, there remain several scenarios where
developers cannot take full advantage of them, stemming from factors such as
restricted or unreliable internet access, institutional privacy policies that
prohibit external transmission of code to third-party vendors, and more.
Therefore, developing a compact, efficient, and yet energy-saving model for
deployment on developers' devices becomes essential.
To this aim, we propose Avatar, a novel approach that crafts a deployable
model from a large language model of code by optimizing it in terms of model
size, inference latency, energy consumption, and carbon footprint while
maintaining a comparable level of effectiveness. The key idea of Avatar is to
formulate the optimization of language models as a multi-objective
configuration tuning problem and solve it with the help of a Satisfiability
Modulo Theories (SMT) solver and a tailored optimization algorithm. The SMT
solver is used to form an appropriate configuration space, while the
optimization algorithm identifies the Pareto-optimal set of configurations for
training the optimized models using knowledge distillation. We evaluate Avatar
with two popular language models of code, i.e., CodeBERT and GraphCodeBERT, on
two popular tasks, i.e., vulnerability prediction and clone detection. We use
Avatar to produce optimized models with a small size (3 MB), which is
160$\times$ smaller than the original large models. On the two tasks, the
optimized models significantly reduce the energy consumption (up to 184$\times$
less), carbon footprint (up to 157$\times$ less), and inference latency (up to
76$\times$ faster), with only a negligible loss in effectiveness (1.67\% on
average).
Related papers
- DiaBlo: Diagonal Blocks Are Sufficient For Finetuning [5.615105036691153]
We present DiaBlo, a PEFT approach that updates only the diagonal blocks of selected model weight matrices.<n>Unlike Low Rank Adaptation (LoRA) and its variants, DiaBlo eliminates the need for low rank matrix products.<n>This design leads to stable and robust convergence while maintaining comparable memory efficiency and training speed to LoRA.
arXiv Detail & Related papers (2025-06-03T13:47:59Z) - EfficientLLaVA:Generalizable Auto-Pruning for Large Vision-language Models [64.18350535770357]
We propose an automatic pruning method for large vision-language models to enhance the efficiency of multimodal reasoning.
Our approach only leverages a small number of samples to search for the desired pruning policy.
We conduct extensive experiments on the ScienceQA, Vizwiz, MM-vet, and LLaVA-Bench datasets for the task of visual question answering.
arXiv Detail & Related papers (2025-03-19T16:07:04Z) - Optimizing Distributed Deployment of Mixture-of-Experts Model Inference in Serverless Computing [9.217991144854851]
Mixture-of-Experts (MoE) models have been a dominant type of model architectures nowadays.
We study optimized MoE model deployment and distributed inference serving on a serverless platform.
Our designs reduce the billed cost of all MoE layers by at least 75.67% compared to CPU clusters.
arXiv Detail & Related papers (2025-01-09T15:29:33Z) - Numerical Pruning for Efficient Autoregressive Models [87.56342118369123]
This paper focuses on compressing decoder-only transformer-based autoregressive models through structural weight pruning.
Specifically, we propose a training-free pruning method that calculates a numerical score with Newton's method for the Attention and modules, respectively.
To verify the effectiveness of our method, we provide both theoretical support and extensive experiments.
arXiv Detail & Related papers (2024-12-17T01:09:23Z) - Model Fusion through Bayesian Optimization in Language Model Fine-Tuning [16.86812534268461]
Fine-tuning pre-trained models for downstream tasks is a widely adopted technique known for its adaptability and reliability across various domains.
We introduce a novel model fusion technique that optimize both the desired metric and loss through multi-objective Bayesian optimization.
Experiments across various downstream tasks show considerable performance improvements using our Bayesian optimization-guided method.
arXiv Detail & Related papers (2024-11-11T04:36:58Z) - Measuring Code Efficiency Optimization Capabilities with ACEOB [7.4056083791645495]
We conduct an in-depth analysis of "code patterns" in the model training dataset, meticulously exploring human-written code.
We introduce the Automatic Code Efficiency Optimization Benchmark (ACEOB), which consists of 95,359 pairs of efficient-inefficient code.
To our knowledge, ACEOB is the first dataset specifically targeting Python code efficiency optimization.
arXiv Detail & Related papers (2024-08-23T10:10:37Z) - Fine-Tuning and Deploying Large Language Models Over Edges: Issues and Approaches [64.42735183056062]
Large language models (LLMs) have transitioned from specialized models to versatile foundation models.
LLMs exhibit impressive zero-shot ability, however, require fine-tuning on local datasets and significant resources for deployment.
arXiv Detail & Related papers (2024-08-20T09:42:17Z) - Concept Distillation from Strong to Weak Models via Hypotheses-to-Theories Prompting [7.146498833443095]
Concept Distillation (CD) is an automatic prompt optimization technique for enhancing weaker models on complex tasks.
CD involves: (1) collecting mistakes made by weak models with a base prompt (initialization), (2) using a strong model to generate reasons for these mistakes and create rules/concepts for weak models (induction), and (3) filtering these rules based on validation set performance.
We evaluated CD on NL2Code and mathematical reasoning tasks, observing significant performance boosts for small and weaker language models.
arXiv Detail & Related papers (2024-08-18T05:37:48Z) - Decoding-Time Language Model Alignment with Multiple Objectives [116.42095026960598]
Existing methods primarily focus on optimizing LMs for a single reward function, limiting their adaptability to varied objectives.
Here, we propose $textbfmulti-objective decoding (MOD)$, a decoding-time algorithm that outputs the next token from a linear combination of predictions.
We show why existing approaches can be sub-optimal even in natural settings and obtain optimality guarantees for our method.
arXiv Detail & Related papers (2024-06-27T02:46:30Z) - Diffusion Model for Data-Driven Black-Box Optimization [54.25693582870226]
We focus on diffusion models, a powerful generative AI technology, and investigate their potential for black-box optimization.
We study two practical types of labels: 1) noisy measurements of a real-valued reward function and 2) human preference based on pairwise comparisons.
Our proposed method reformulates the design optimization problem into a conditional sampling problem, which allows us to leverage the power of diffusion models.
arXiv Detail & Related papers (2024-03-20T00:41:12Z) - Model Compression and Efficient Inference for Large Language Models: A
Survey [20.199282252344396]
Large language models have two prominent characteristics compared to smaller models.
The most notable aspect of large models is the very high cost associated with model finetuning or training.
Large models emphasize versatility and generalization rather than performance on a single task.
arXiv Detail & Related papers (2024-02-15T06:58:30Z) - eP-ALM: Efficient Perceptual Augmentation of Language Models [70.47962271121389]
We propose to direct effort to efficient adaptations of existing models, and propose to augment Language Models with perception.
Existing approaches for adapting pretrained models for vision-language tasks still rely on several key components that hinder their efficiency.
We show that by freezing more than 99% of total parameters, training only one linear projection layer, and prepending only one trainable token, our approach (dubbed eP-ALM) significantly outperforms other baselines on VQA and Captioning.
arXiv Detail & Related papers (2023-03-20T19:20:34Z) - Slapo: A Schedule Language for Progressive Optimization of Large Deep
Learning Model Training [17.556432199389615]
Slapo is a schedule language that decouples the execution of a tensor-level operator from its arithmetic definition.
We show that Slapo can improve training throughput by up to 2.92x on a single machine with 8 NVIDIA V100 GPUs.
arXiv Detail & Related papers (2023-02-16T00:34:53Z) - A Unified Cascaded Encoder ASR Model for Dynamic Model Sizes [54.83802872236367]
We propose a dynamic cascaded encoder Automatic Speech Recognition (ASR) model, which unifies models for different deployment scenarios.
The proposed large-medium model has 30% smaller size and reduces power consumption by 33%, compared to the baseline cascaded encoder model.
The triple-size model that unifies the large, medium, and small models achieves 37% total size reduction with minimal quality loss.
arXiv Detail & Related papers (2022-04-13T04:15:51Z) - Conservative Objective Models for Effective Offline Model-Based
Optimization [78.19085445065845]
Computational design problems arise in a number of settings, from synthetic biology to computer architectures.
We propose a method that learns a model of the objective function that lower bounds the actual value of the ground-truth objective on out-of-distribution inputs.
COMs are simple to implement and outperform a number of existing methods on a wide range of MBO problems.
arXiv Detail & Related papers (2021-07-14T17:55:28Z) - Bayesian Optimization for Selecting Efficient Machine Learning Models [53.202224677485525]
We present a unified Bayesian Optimization framework for jointly optimizing models for both prediction effectiveness and training efficiency.
Experiments on model selection for recommendation tasks indicate models selected this way significantly improves model training efficiency.
arXiv Detail & Related papers (2020-08-02T02:56:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.