Exploring and Evaluating Personalized Models for Code Generation
- URL: http://arxiv.org/abs/2208.13928v1
- Date: Mon, 29 Aug 2022 23:28:46 GMT
- Title: Exploring and Evaluating Personalized Models for Code Generation
- Authors: Andrei Zlotchevski, Dawn Drain, Alexey Svyatkovskiy, Colin Clement,
Neel Sundaresan, Michele Tufano
- Abstract summary: We evaluate transformer model fine-tuning for personalization.
We consider three key approaches: (i) custom fine-tuning, which allows all the model parameters to be tuned.
We compare these fine-tuning strategies for code generation and discuss the potential generalization and cost benefits of each in various deployment scenarios.
- Score: 9.25440316608194
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Large Transformer models achieved the state-of-the-art status for Natural
Language Understanding tasks and are increasingly becoming the baseline model
architecture for modeling source code. Transformers are usually pre-trained on
large unsupervised corpora, learning token representations and transformations
relevant to modeling generally available text, and are then fine-tuned on a
particular downstream task of interest. While fine-tuning is a tried-and-true
method for adapting a model to a new domain -- for example, question-answering
on a given topic -- generalization remains an on-going challenge. In this
paper, we explore and evaluate transformer model fine-tuning for
personalization. In the context of generating unit tests for Java methods, we
evaluate learning to personalize to a specific software project using several
personalization techniques. We consider three key approaches: (i) custom
fine-tuning, which allows all the model parameters to be tuned; (ii)
lightweight fine-tuning, which freezes most of the model's parameters, allowing
tuning of the token embeddings and softmax layer only or the final layer alone;
(iii) prefix tuning, which keeps model parameters frozen, but optimizes a small
project-specific prefix vector. Each of these techniques offers a trade-off in
total compute cost and predictive performance, which we evaluate by code and
task-specific metrics, training time, and total computational operations. We
compare these fine-tuning strategies for code generation and discuss the
potential generalization and cost benefits of each in various deployment
scenarios.
Related papers
- Revisiting SMoE Language Models by Evaluating Inefficiencies with Task Specific Expert Pruning [78.72226641279863]
Sparse Mixture of Expert (SMoE) models have emerged as a scalable alternative to dense models in language modeling.
Our research explores task-specific model pruning to inform decisions about designing SMoE architectures.
We introduce an adaptive task-aware pruning technique UNCURL to reduce the number of experts per MoE layer in an offline manner post-training.
arXiv Detail & Related papers (2024-09-02T22:35:03Z) - Scaling Pre-trained Language Models to Deeper via Parameter-efficient
Architecture [68.13678918660872]
We design a more capable parameter-sharing architecture based on matrix product operator (MPO)
MPO decomposition can reorganize and factorize the information of a parameter matrix into two parts.
Our architecture shares the central tensor across all layers for reducing the model size.
arXiv Detail & Related papers (2023-03-27T02:34:09Z) - Evaluating Representations with Readout Model Switching [19.907607374144167]
In this paper, we propose to use the Minimum Description Length (MDL) principle to devise an evaluation metric.
We design a hybrid discrete and continuous-valued model space for the readout models and employ a switching strategy to combine their predictions.
The proposed metric can be efficiently computed with an online method and we present results for pre-trained vision encoders of various architectures.
arXiv Detail & Related papers (2023-02-19T14:08:01Z) - Prototypical Fine-tuning: Towards Robust Performance Under Varying Data
Sizes [47.880781811936345]
We propose a novel framework for fine-tuning pretrained language models (LM)
Our prototypical fine-tuning approach can automatically adjust the model capacity according to the number of data points and the model's inherent attributes.
arXiv Detail & Related papers (2022-11-24T14:38:08Z) - HyperImpute: Generalized Iterative Imputation with Automatic Model
Selection [77.86861638371926]
We propose a generalized iterative imputation framework for adaptively and automatically configuring column-wise models.
We provide a concrete implementation with out-of-the-box learners, simulators, and interfaces.
arXiv Detail & Related papers (2022-06-15T19:10:35Z) - Re-parameterizing Your Optimizers rather than Architectures [119.08740698936633]
We propose a novel paradigm of incorporating model-specific prior knowledge into Structurals and using them to train generic (simple) models.
As an implementation, we propose a novel methodology to add prior knowledge by modifying the gradients according to a set of model-specific hyper- parameters.
For a simple model trained with a Repr, we focus on a VGG-style plain model and showcase that such a simple model trained with a Repr, which is referred to as Rep-VGG, performs on par with the recent well-designed models.
arXiv Detail & Related papers (2022-05-30T16:55:59Z) - Towards a Unified View of Parameter-Efficient Transfer Learning [108.94786930869473]
Fine-tuning large pre-trained language models on downstream tasks has become the de-facto learning paradigm in NLP.
Recent work has proposed a variety of parameter-efficient transfer learning methods that only fine-tune a small number of (extra) parameters to attain strong performance.
We break down the design of state-of-the-art parameter-efficient transfer learning methods and present a unified framework that establishes connections between them.
arXiv Detail & Related papers (2021-10-08T20:22:26Z) - Conservative Objective Models for Effective Offline Model-Based
Optimization [78.19085445065845]
Computational design problems arise in a number of settings, from synthetic biology to computer architectures.
We propose a method that learns a model of the objective function that lower bounds the actual value of the ground-truth objective on out-of-distribution inputs.
COMs are simple to implement and outperform a number of existing methods on a wide range of MBO problems.
arXiv Detail & Related papers (2021-07-14T17:55:28Z) - Model-agnostic and Scalable Counterfactual Explanations via
Reinforcement Learning [0.5729426778193398]
We propose a deep reinforcement learning approach that transforms the optimization procedure into an end-to-end learnable process.
Our experiments on real-world data show that our method is model-agnostic, relying only on feedback from model predictions.
arXiv Detail & Related papers (2021-06-04T16:54:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.