The EarlyBIRD Catches the Bug: On Exploiting Early Layers of Encoder
Models for More Efficient Code Classification
- URL: http://arxiv.org/abs/2305.04940v2
- Date: Mon, 11 Sep 2023 17:56:10 GMT
- Title: The EarlyBIRD Catches the Bug: On Exploiting Early Layers of Encoder
Models for More Efficient Code Classification
- Authors: Anastasiia Grishina and Max Hort and Leon Moonen
- Abstract summary: Training deep NLP models requires significant computational resources.
We propose a generic approach, EarlyBIRD, to build composite representations of code from the early layers of a pre-trained transformer model.
- Score: 7.205265729540538
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The use of modern Natural Language Processing (NLP) techniques has shown to
be beneficial for software engineering tasks, such as vulnerability detection
and type inference. However, training deep NLP models requires significant
computational resources. This paper explores techniques that aim at achieving
the best usage of resources and available information in these models.
We propose a generic approach, EarlyBIRD, to build composite representations
of code from the early layers of a pre-trained transformer model. We
empirically investigate the viability of this approach on the CodeBERT model by
comparing the performance of 12 strategies for creating composite
representations with the standard practice of only using the last encoder
layer.
Our evaluation on four datasets shows that several early layer combinations
yield better performance on defect detection, and some combinations improve
multi-class classification. More specifically, we obtain a +2 average
improvement of detection accuracy on Devign with only 3 out of 12 layers of
CodeBERT and a 3.3x speed-up of fine-tuning. These findings show that early
layers can be used to obtain better results using the same resources, as well
as to reduce resource usage during fine-tuning and inference.
Related papers
- Back to Basics: A Simple Recipe for Improving Out-of-Domain Retrieval in
Dense Encoders [63.28408887247742]
We study whether training procedures can be improved to yield better generalization capabilities in the resulting models.
We recommend a simple recipe for training dense encoders: Train on MSMARCO with parameter-efficient methods, such as LoRA, and opt for using in-batch negatives unless given well-constructed hard negatives.
arXiv Detail & Related papers (2023-11-16T10:42:58Z) - Towards Efficient Fine-tuning of Pre-trained Code Models: An
Experimental Study and Beyond [52.656743602538825]
Fine-tuning pre-trained code models incurs a large computational cost.
We conduct an experimental study to explore what happens to layer-wise pre-trained representations and their encoded code knowledge during fine-tuning.
We propose Telly to efficiently fine-tune pre-trained code models via layer freezing.
arXiv Detail & Related papers (2023-04-11T13:34:13Z) - Boosting Low-Data Instance Segmentation by Unsupervised Pre-training
with Saliency Prompt [103.58323875748427]
This work offers a novel unsupervised pre-training solution for low-data regimes.
Inspired by the recent success of the Prompting technique, we introduce a new pre-training method that boosts QEIS models.
Experimental results show that our method significantly boosts several QEIS models on three datasets.
arXiv Detail & Related papers (2023-02-02T15:49:03Z) - Towards a learning-based performance modeling for accelerating Deep
Neural Networks [1.1549572298362785]
We start an investigation of predictive models based on machine learning techniques in order to optimize Convolution Neural Networks (CNNs)
Preliminary experiments on Midgard-based ARM Mali GPU show that our predictive model outperforms all the convolution operators manually selected by the library.
arXiv Detail & Related papers (2022-12-09T18:28:07Z) - Boosting the Efficiency of Parametric Detection with Hierarchical Neural
Networks [4.1410005218338695]
We propose Hierarchical Detection Network (HDN), a novel approach to efficient detection.
The network is trained using a novel loss function, which encodes simultaneously the goals of statistical accuracy and efficiency.
We show how training a three-layer HDN using two-layer model can further boost both accuracy and efficiency.
arXiv Detail & Related papers (2022-07-23T19:23:00Z) - Beyond Simple Meta-Learning: Multi-Purpose Models for Multi-Domain,
Active and Continual Few-Shot Learning [41.07029317930986]
We propose a variance-sensitive class of models that operates in a low-label regime.
The first method, Simple CNAPS, employs a hierarchically regularized Mahalanobis-distance based classifier.
We further extend this approach to a transductive learning setting, proposing Transductive CNAPS.
arXiv Detail & Related papers (2022-01-13T18:59:02Z) - AutoBERT-Zero: Evolving BERT Backbone from Scratch [94.89102524181986]
We propose an Operation-Priority Neural Architecture Search (OP-NAS) algorithm to automatically search for promising hybrid backbone architectures.
We optimize both the search algorithm and evaluation of candidate models to boost the efficiency of our proposed OP-NAS.
Experiments show that the searched architecture (named AutoBERT-Zero) significantly outperforms BERT and its variants of different model capacities in various downstream tasks.
arXiv Detail & Related papers (2021-07-15T16:46:01Z) - Enhancing the Generalization for Intent Classification and Out-of-Domain
Detection in SLU [70.44344060176952]
Intent classification is a major task in spoken language understanding (SLU)
Recent works have shown that using extra data and labels can improve the OOD detection performance.
This paper proposes to train a model with only IND data while supporting both IND intent classification and OOD detection.
arXiv Detail & Related papers (2021-06-28T08:27:38Z) - Few-Shot Named Entity Recognition: A Comprehensive Study [92.40991050806544]
We investigate three schemes to improve the model generalization ability for few-shot settings.
We perform empirical comparisons on 10 public NER datasets with various proportions of labeled data.
We create new state-of-the-art results on both few-shot and training-free settings.
arXiv Detail & Related papers (2020-12-29T23:43:16Z) - Deep Learning Algorithms for Rotating Machinery Intelligent Diagnosis:
An Open Source Benchmark Study [0.8497188292342053]
This paper provides a benchmark study of deep learning algorithms for rotating machinery intelligent diagnosis.
We integrate the whole evaluation codes into a code library and release this code library to the public for better development of this field.
By these works, we release a unified code framework for comparing and testing models fairly and quickly, emphasize the importance of open source codes, provide the baseline accuracy (a lower bound) to avoid useless improvement, and discuss potential future directions in this field.
arXiv Detail & Related papers (2020-03-06T17:24:43Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.