Automating Code-Related Tasks Through Transformers: The Impact of
Pre-training
- URL: http://arxiv.org/abs/2302.04048v1
- Date: Wed, 8 Feb 2023 13:37:33 GMT
- Title: Automating Code-Related Tasks Through Transformers: The Impact of
Pre-training
- Authors: Rosalia Tufano, Luca Pascarella, Gabriele Bavota
- Abstract summary: We study the impact of pre-training objectives on the performance of transformers when automating code-related tasks.
We pre-train 32 transformers using both (i) generic pre-training objectives usually adopted in software engineering (SE) literature; and (ii) pre-training objectives tailored to specific code-related tasks.
- Score: 15.129062963782005
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Transformers have gained popularity in the software engineering (SE)
literature. These deep learning models are usually pre-trained through a
self-supervised objective, meant to provide the model with basic knowledge
about a language of interest (e.g., Java). A classic pre-training objective is
the masked language model (MLM), in which a percentage of tokens from the input
(e.g., a Java method) is masked, with the model in charge of predicting them.
Once pre-trained, the model is then fine-tuned to support the specific
downstream task of interest (e.g., code summarization). While there is evidence
suggesting the boost in performance provided by pre-training, little is known
about the impact of the specific pre-training objective(s) used. Indeed, MLM is
just one of the possible pre-training objectives and recent work from the
natural language processing field suggest that pre-training objectives tailored
for the specific downstream task of interest may substantially boost the
model's performance. In this study, we focus on the impact of pre-training
objectives on the performance of transformers when automating code-related
tasks. We start with a systematic literature review aimed at identifying the
pre-training objectives used in SE. Then, we pre-train 32 transformers using
both (i) generic pre-training objectives usually adopted in SE; and (ii)
pre-training objectives tailored to specific code-related tasks subject of our
experimentation, namely bug-fixing, code summarization, and code completion. We
also compare the pre-trained models with non pre-trained ones. Our results show
that: (i) pre-training helps in boosting performance only if the amount of
fine-tuning data available is small; (ii) the MLM objective is usually
sufficient to maximize the prediction performance of the model, even when
comparing it with pre-training objectives specialized for the downstream task
at hand.
Related papers
- How Effective is Pre-training of Large Masked Autoencoders for Downstream Earth Observation Tasks? [9.515532265294187]
Self-supervised pre-training has proven highly effective for many computer vision tasks.
It remains unclear under which conditions pre-trained models offer significant advantages over training from scratch.
arXiv Detail & Related papers (2024-09-27T08:15:14Z) - Task-customized Masked AutoEncoder via Mixture of Cluster-conditional
Experts [104.9871176044644]
Masked Autoencoder(MAE) is a prevailing self-supervised learning method that achieves promising results in model pre-training.
We propose a novel MAE-based pre-training paradigm, Mixture of Cluster-conditional Experts (MoCE)
MoCE trains each expert only with semantically relevant images by using cluster-conditional gates.
arXiv Detail & Related papers (2024-02-08T03:46:32Z) - Revisit Few-shot Intent Classification with PLMs: Direct Fine-tuning vs. Continual Pre-training [20.98770732015944]
Few-shot intent detection involves training a deep learning model to classify utterances based on their underlying intents using only a small amount of labeled data.
We show that continual pre-training may not be essential, since the overfitting problem of PLMs on this task may not be as serious as expected.
To maximize the utilization of the limited available data, we propose a context augmentation method and leverage sequential self-distillation to boost performance.
arXiv Detail & Related papers (2023-06-08T15:26:52Z) - Self-Distillation for Further Pre-training of Transformers [83.84227016847096]
We propose self-distillation as a regularization for a further pre-training stage.
We empirically validate the efficacy of self-distillation on a variety of benchmark datasets for image and text classification tasks.
arXiv Detail & Related papers (2022-09-30T02:25:12Z) - Masked Autoencoders As The Unified Learners For Pre-Trained Sentence
Representation [77.47617360812023]
We extend the recently proposed MAE style pre-training strategy, RetroMAE, to support a wide variety of sentence representation tasks.
The first stage performs RetroMAE over generic corpora, like Wikipedia, BookCorpus, etc., from which the base model is learned.
The second stage takes place on domain-specific data, e.g., MS MARCO and NLI, where the base model is continuingly trained based on RetroMAE and contrastive learning.
arXiv Detail & Related papers (2022-07-30T14:34:55Z) - Frustratingly Simple Pretraining Alternatives to Masked Language
Modeling [10.732163031244651]
Masked language modeling (MLM) is widely used in natural language processing for learning text representations.
In this paper, we explore five simple pretraining objectives based on token-level classification tasks as replacements of representations.
arXiv Detail & Related papers (2021-09-04T08:52:37Z) - Self-Supervised Meta-Learning for Few-Shot Natural Language
Classification Tasks [40.97125791174191]
We propose a self-supervised approach to generate a large, rich, meta-learning task distribution from unlabeled text.
We show that this meta-training leads to better few-shot generalization than language-model pre-training followed by finetuning.
arXiv Detail & Related papers (2020-09-17T17:53:59Z) - Task-specific Objectives of Pre-trained Language Models for Dialogue
Adaptation [79.0866650271659]
Common process of utilizing PrLMs is first pre-training on large-scale general corpora with task-independent LM training objectives, then fine-tuning on task datasets with task-specific training objectives.
We introduce task-specific pre-training on in-domain task-related corpora with task-specific objectives.
This procedure is placed between the original two stages to enhance the model understanding capacity of specific tasks.
arXiv Detail & Related papers (2020-09-10T16:46:46Z) - Train No Evil: Selective Masking for Task-Guided Pre-Training [97.03615486457065]
We propose a three-stage framework by adding a task-guided pre-training stage with selective masking between general pre-training and fine-tuning.
We show that our method can achieve comparable or even better performance with less than 50% of cost.
arXiv Detail & Related papers (2020-04-21T03:14:22Z) - Pre-training Text Representations as Meta Learning [113.3361289756749]
We introduce a learning algorithm which directly optimize model's ability to learn text representations for effective learning of downstream tasks.
We show that there is an intrinsic connection between multi-task pre-training and model-agnostic meta-learning with a sequence of meta-train steps.
arXiv Detail & Related papers (2020-04-12T09:05:47Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.