Towards Efficient Fine-tuning of Pre-trained Code Models: An
Experimental Study and Beyond
- URL: http://arxiv.org/abs/2304.05216v1
- Date: Tue, 11 Apr 2023 13:34:13 GMT
- Title: Towards Efficient Fine-tuning of Pre-trained Code Models: An
Experimental Study and Beyond
- Authors: Ensheng Shi, Yanlin Wang, Hongyu Zhang, Lun Du, Shi Han, Dongmei
Zhang, Hongbin Sun
- Abstract summary: Fine-tuning pre-trained code models incurs a large computational cost.
We conduct an experimental study to explore what happens to layer-wise pre-trained representations and their encoded code knowledge during fine-tuning.
We propose Telly to efficiently fine-tune pre-trained code models via layer freezing.
- Score: 52.656743602538825
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recently, fine-tuning pre-trained code models such as CodeBERT on downstream
tasks has achieved great success in many software testing and analysis tasks.
While effective and prevalent, fine-tuning the pre-trained parameters incurs a
large computational cost. In this paper, we conduct an extensive experimental
study to explore what happens to layer-wise pre-trained representations and
their encoded code knowledge during fine-tuning. We then propose efficient
alternatives to fine-tune the large pre-trained code model based on the above
findings. Our experimental study shows that (1) lexical, syntactic and
structural properties of source code are encoded in the lower, intermediate,
and higher layers, respectively, while the semantic property spans across the
entire model. (2) The process of fine-tuning preserves most of the code
properties. Specifically, the basic code properties captured by lower and
intermediate layers are still preserved during fine-tuning. Furthermore, we
find that only the representations of the top two layers change most during
fine-tuning for various downstream tasks. (3) Based on the above findings, we
propose Telly to efficiently fine-tune pre-trained code models via layer
freezing. The extensive experimental results on five various downstream tasks
demonstrate that training parameters and the corresponding time cost are
greatly reduced, while performances are similar or better. Replication package
including source code, datasets, and online Appendix is available at:
\url{https://github.com/DeepSoftwareAnalytics/Telly}.
Related papers
- Entropy Guided Extrapolative Decoding to Improve Factuality in Large Language Models [55.45444773200529]
Large language models (LLMs) exhibit impressive natural language capabilities but suffer from hallucination.
Recent work has focused on decoding techniques to improve factuality during inference.
arXiv Detail & Related papers (2024-04-14T19:45:35Z) - Diffused Redundancy in Pre-trained Representations [98.55546694886819]
We take a closer look at how features are encoded in pre-trained representations.
We find that learned representations in a given layer exhibit a degree of diffuse redundancy.
Our findings shed light on the nature of representations learned by pre-trained deep neural networks.
arXiv Detail & Related papers (2023-05-31T21:00:50Z) - Enriching Source Code with Contextual Data for Code Completion Models:
An Empirical Study [4.438873396405334]
We aim to answer whether making code easier to understand through using contextual data improves the performance of pre-trained code language models for the task of code completion.
For comments, we find that the models perform better in the presence of multi-line comments.
arXiv Detail & Related papers (2023-04-24T17:09:14Z) - CodeExp: Explanatory Code Document Generation [94.43677536210465]
Existing code-to-text generation models produce only high-level summaries of code.
We conduct a human study to identify the criteria for high-quality explanatory docstring for code.
We present a multi-stage fine-tuning strategy and baseline models for the task.
arXiv Detail & Related papers (2022-11-25T18:05:44Z) - No More Fine-Tuning? An Experimental Evaluation of Prompt Tuning in Code
Intelligence [33.438384268490815]
In this paper, we empirically evaluate the usage and effect of prompt tuning in code intelligence tasks.
Our results show that prompt tuning consistently outperforms fine-tuning in all three tasks.
Our results suggest that instead of fine-tuning, we could adapt prompt tuning for code intelligence tasks to achieve better performance.
arXiv Detail & Related papers (2022-07-24T07:29:17Z) - Enhancing Semantic Code Search with Multimodal Contrastive Learning and
Soft Data Augmentation [50.14232079160476]
We propose a new approach with multimodal contrastive learning and soft data augmentation for code search.
We conduct extensive experiments to evaluate the effectiveness of our approach on a large-scale dataset with six programming languages.
arXiv Detail & Related papers (2022-04-07T08:49:27Z) - CodeRetriever: Unimodal and Bimodal Contrastive Learning [128.06072658302165]
We propose the CodeRetriever model, which combines the unimodal and bimodal contrastive learning to train function-level code semantic representations.
For unimodal contrastive learning, we design a semantic-guided method to build positive code pairs based on the documentation and function name.
For bimodal contrastive learning, we leverage the documentation and in-line comments of code to build text-code pairs.
arXiv Detail & Related papers (2022-01-26T10:54:30Z) - Bridging Pre-trained Models and Downstream Tasks for Source Code
Understanding [13.65914588243695]
We propose an approach to bridge pre-trained models and code-related tasks.
We exploit semantic-preserving transformation to enrich downstream data diversity.
We introduce curriculum learning to organize the transformed data in an easy-to-hard manner to fine-tune existing pre-trained models.
arXiv Detail & Related papers (2021-12-04T07:21:28Z) - What do pre-trained code models know about code? [9.60966128833701]
We use diagnostic tasks called probes to investigate pre-trained code models.
BERT (pre-trained on English), CodeBERT and CodeBERTa (pre-trained on source code, and natural language documentation), and GraphCodeBERT (pre-trained on source code with dataflow) are investigated.
arXiv Detail & Related papers (2021-08-25T16:20:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.