Related papers: The EarlyBIRD Catches the Bug: On Exploiting Early Layers of Encoder Models for More Efficient Code Classification

The EarlyBIRD Catches the Bug: On Exploiting Early Layers of Encoder Models for More Efficient Code Classification

URL: http://arxiv.org/abs/2305.04940v2
Date: Mon, 11 Sep 2023 17:56:10 GMT
Title: The EarlyBIRD Catches the Bug: On Exploiting Early Layers of Encoder Models for More Efficient Code Classification
Authors: Anastasiia Grishina and Max Hort and Leon Moonen
Abstract summary: Training deep NLP models requires significant computational resources. We propose a generic approach, EarlyBIRD, to build composite representations of code from the early layers of a pre-trained transformer model.
Score: 7.205265729540538
License: http://creativecommons.org/licenses/by/4.0/
Abstract: The use of modern Natural Language Processing (NLP) techniques has shown to be beneficial for software engineering tasks, such as vulnerability detection and type inference. However, training deep NLP models requires significant computational resources. This paper explores techniques that aim at achieving the best usage of resources and available information in these models. We propose a generic approach, EarlyBIRD, to build composite representations of code from the early layers of a pre-trained transformer model. We empirically investigate the viability of this approach on the CodeBERT model by comparing the performance of 12 strategies for creating composite representations with the standard practice of only using the last encoder layer. Our evaluation on four datasets shows that several early layer combinations yield better performance on defect detection, and some combinations improve multi-class classification. More specifically, we obtain a +2 average improvement of detection accuracy on Devign with only 3 out of 12 layers of CodeBERT and a 3.3x speed-up of fine-tuning. These findings show that early layers can be used to obtain better results using the same resources, as well as to reduce resource usage during fine-tuning and inference.

Related papers

CodingTeachLLM: Empowering LLM's Coding Ability via AST Prior Knowledge [0.0]
We introduce CodingTeachLLM, a large language model (LLM) designed for coding teaching. Our model realizes the structural disassembly and incremental guided output of educational knowledge. Our model also achieves state-of-the-art in code abilities compared to open-source models.
arXiv Detail & Related papers (2024-03-13T05:38:39Z)
Back to Basics: A Simple Recipe for Improving Out-of-Domain Retrieval in Dense Encoders [63.28408887247742]
We study whether training procedures can be improved to yield better generalization capabilities in the resulting models. We recommend a simple recipe for training dense encoders: Train on MSMARCO with parameter-efficient methods, such as LoRA, and opt for using in-batch negatives unless given well-constructed hard negatives.
arXiv Detail & Related papers (2023-11-16T10:42:58Z)
Towards Efficient Fine-tuning of Pre-trained Code Models: An Experimental Study and Beyond [52.656743602538825]
Fine-tuning pre-trained code models incurs a large computational cost. We conduct an experimental study to explore what happens to layer-wise pre-trained representations and their encoded code knowledge during fine-tuning. We propose Telly to efficiently fine-tune pre-trained code models via layer freezing.
arXiv Detail & Related papers (2023-04-11T13:34:13Z)
Boosting Low-Data Instance Segmentation by Unsupervised Pre-training with Saliency Prompt [103.58323875748427]
This work offers a novel unsupervised pre-training solution for low-data regimes. Inspired by the recent success of the Prompting technique, we introduce a new pre-training method that boosts QEIS models. Experimental results show that our method significantly boosts several QEIS models on three datasets.
arXiv Detail & Related papers (2023-02-02T15:49:03Z)
Towards a learning-based performance modeling for accelerating Deep Neural Networks [1.1549572298362785]
We start an investigation of predictive models based on machine learning techniques in order to optimize Convolution Neural Networks (CNNs) Preliminary experiments on Midgard-based ARM Mali GPU show that our predictive model outperforms all the convolution operators manually selected by the library.
arXiv Detail & Related papers (2022-12-09T18:28:07Z)
Boosting the Efficiency of Parametric Detection with Hierarchical Neural Networks [4.1410005218338695]
We propose Hierarchical Detection Network (HDN), a novel approach to efficient detection. The network is trained using a novel loss function, which encodes simultaneously the goals of statistical accuracy and efficiency. We show how training a three-layer HDN using two-layer model can further boost both accuracy and efficiency.
arXiv Detail & Related papers (2022-07-23T19:23:00Z)
Beyond Simple Meta-Learning: Multi-Purpose Models for Multi-Domain, Active and Continual Few-Shot Learning [41.07029317930986]
We propose a variance-sensitive class of models that operates in a low-label regime. The first method, Simple CNAPS, employs a hierarchically regularized Mahalanobis-distance based classifier. We further extend this approach to a transductive learning setting, proposing Transductive CNAPS.
arXiv Detail & Related papers (2022-01-13T18:59:02Z)
AutoBERT-Zero: Evolving BERT Backbone from Scratch [94.89102524181986]
We propose an Operation-Priority Neural Architecture Search (OP-NAS) algorithm to automatically search for promising hybrid backbone architectures. We optimize both the search algorithm and evaluation of candidate models to boost the efficiency of our proposed OP-NAS. Experiments show that the searched architecture (named AutoBERT-Zero) significantly outperforms BERT and its variants of different model capacities in various downstream tasks.
arXiv Detail & Related papers (2021-07-15T16:46:01Z)
Enhancing the Generalization for Intent Classification and Out-of-Domain Detection in SLU [70.44344060176952]
Intent classification is a major task in spoken language understanding (SLU) Recent works have shown that using extra data and labels can improve the OOD detection performance. This paper proposes to train a model with only IND data while supporting both IND intent classification and OOD detection.
arXiv Detail & Related papers (2021-06-28T08:27:38Z)
Few-Shot Named Entity Recognition: A Comprehensive Study [92.40991050806544]
We investigate three schemes to improve the model generalization ability for few-shot settings. We perform empirical comparisons on 10 public NER datasets with various proportions of labeled data. We create new state-of-the-art results on both few-shot and training-free settings.
arXiv Detail & Related papers (2020-12-29T23:43:16Z)
Deep Learning Algorithms for Rotating Machinery Intelligent Diagnosis: An Open Source Benchmark Study [0.8497188292342053]
This paper provides a benchmark study of deep learning algorithms for rotating machinery intelligent diagnosis. We integrate the whole evaluation codes into a code library and release this code library to the public for better development of this field. By these works, we release a unified code framework for comparing and testing models fairly and quickly, emphasize the importance of open source codes, provide the baseline accuracy (a lower bound) to avoid useless improvement, and discuss potential future directions in this field.
arXiv Detail & Related papers (2020-03-06T17:24:43Z)

This list is automatically generated from the titles and abstracts of the papers in this site.