Empirical Study on Transformer-based Techniques for Software Engineering
- URL: http://arxiv.org/abs/2310.00399v1
- Date: Sat, 30 Sep 2023 14:45:22 GMT
- Title: Empirical Study on Transformer-based Techniques for Software Engineering
- Authors: Yan Xiao, Xinyue Zuo, Lei Xue, Kailong Wang, Jin Song Dong and Ivan
Beschastnikh
- Abstract summary: We review the existing literature, examine the suitability of model architectures for different tasks, and look at the generalization ability of models on different datasets.
We conduct experiments on the top-4 most targeted software engineering tasks that we found in our literature survey: Code Summarization, Bug Fixing, Bug Detection, and Code Search.
- Score: 12.973997150227198
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Many Transformer-based pre-trained models for code have been developed and
applied to code-related tasks. In this paper, we review the existing
literature, examine the suitability of model architectures for different tasks,
and look at the generalization ability of models on different datasets, and
their resource consumption.
We examine three very representative pre-trained models for code: CodeBERT,
CodeGPT, and CodeT5, and conduct experiments on the top-4 most targeted
software engineering tasks that we found in our literature survey: Code
Summarization, Bug Fixing, Bug Detection, and Code Search. In our study, we
showcase the capability of decoder-only models (CodeGPT) for specific
generation tasks under state-of-the-art evaluation metrics and contest the
common belief that the encoder-decoder architecture is optimal for
general-purpose coding tasks. Additionally, we found that the most frequently
used models are not necessarily the most suitable for certain applications and
the developers' needs are not adequately addressed by current research. As
well, we found that the benchmark and frequent dataset for Bug Fixing and Code
Summarization both fail to enable models to generalize onto other datasets for
the same task (the frequent dataset refers to the dataset with the highest
frequency used in literature other than the benchmark). We use statistical
testing to support our conclusions from experiments. Finally, CodeBERT is
highly efficient for understanding tasks, whereas CodeT5's efficiency for
generation tasks is in doubt, as the highest resource consumption does not
guarantee a consistent better performance on different metrics. We also discuss
the numerous practical issues in advancing future research on transformer-based
models for code-related tasks.
Related papers
- CodeXEmbed: A Generalist Embedding Model Family for Multiligual and Multi-task Code Retrieval [103.116634967815]
We introduce CodeXEmbed, a family of large-scale code embedding models ranging from 400M to 7B parameters.
Our novel training pipeline unifies multiple programming languages and transforms various code-related tasks into a common retrieval framework.
Our 7B model sets a new state-of-the-art (SOTA) in code retrieval, outperforming the previous leading model, Voyage-Code, by over 20% on CoIR benchmark.
arXiv Detail & Related papers (2024-11-19T16:54:45Z) - OpenCoder: The Open Cookbook for Top-Tier Code Large Language Models [70.72097493954067]
Large language models (LLMs) for code have become indispensable in various domains, including code generation, reasoning tasks and agent systems.
While open-access code LLMs are increasingly approaching the performance levels of proprietary models, high-quality code LLMs remain limited.
We introduce OpenCoder, a top-tier code LLM that not only achieves performance comparable to leading models but also serves as an "open cookbook" for the research community.
arXiv Detail & Related papers (2024-11-07T17:47:25Z) - DA-Code: Agent Data Science Code Generation Benchmark for Large Language Models [36.266383541354294]
This benchmark features three core elements: First, the tasks within DA-Code are inherently challenging, setting them apart from traditional code generation tasks.
Second, examples in DA-Code are all based on real and diverse data, covering a wide range of complex data wrangling and analytics tasks.
Third, to solve the tasks, the models must utilize complex data science programming languages, to perform intricate data processing and derive the answers.
arXiv Detail & Related papers (2024-10-09T18:00:05Z) - INSPECT: Intrinsic and Systematic Probing Evaluation for Code
Transformers [7.255653248042546]
We use a framework to define 15 probing tasks that exercise surface, syntactic, structural and semantic characteristics of source code.
We probe 8 pre-trained source code models, as well as a natural language model (BERT) as our baseline.
We find that models that incorporate some structural information (such as GraphCodeBERT) have a better representation of source code characteristics.
arXiv Detail & Related papers (2023-12-08T15:21:54Z) - LLM-Assisted Code Cleaning For Training Accurate Code Generators [53.087019724256606]
We investigate data quality for code and find that making the code more structured and readable leads to improved code generation performance of the system.
We build a novel data-cleaning pipeline that uses these principles to transform existing programs.
We evaluate our approach on two challenging algorithmic code generation benchmarks and find that fine-tuning CodeLLaMa-7B improves the performance by up to 30% compared to fine-tuning on the original dataset.
arXiv Detail & Related papers (2023-11-25T02:45:50Z) - CodeExp: Explanatory Code Document Generation [94.43677536210465]
Existing code-to-text generation models produce only high-level summaries of code.
We conduct a human study to identify the criteria for high-quality explanatory docstring for code.
We present a multi-stage fine-tuning strategy and baseline models for the task.
arXiv Detail & Related papers (2022-11-25T18:05:44Z) - Enhancing Semantic Code Search with Multimodal Contrastive Learning and
Soft Data Augmentation [50.14232079160476]
We propose a new approach with multimodal contrastive learning and soft data augmentation for code search.
We conduct extensive experiments to evaluate the effectiveness of our approach on a large-scale dataset with six programming languages.
arXiv Detail & Related papers (2022-04-07T08:49:27Z) - Probing Pretrained Models of Source Code [14.904366372190943]
General pretrained models have been shown to outperform task-specific models in many applications.
We show that pretrained models of code indeed contain information about code syntactic structure and correctness, the notions of identifiers, data flow and correctnesss, and natural language naming.
arXiv Detail & Related papers (2022-02-16T10:26:14Z) - What do pre-trained code models know about code? [9.60966128833701]
We use diagnostic tasks called probes to investigate pre-trained code models.
BERT (pre-trained on English), CodeBERT and CodeBERTa (pre-trained on source code, and natural language documentation), and GraphCodeBERT (pre-trained on source code with dataflow) are investigated.
arXiv Detail & Related papers (2021-08-25T16:20:17Z) - When Liebig's Barrel Meets Facial Landmark Detection: A Practical Model [87.25037167380522]
We propose a model that is accurate, robust, efficient, generalizable, and end-to-end trainable.
In order to achieve a better accuracy, we propose two lightweight modules.
DQInit dynamically initializes the queries of decoder from the inputs, enabling the model to achieve as good accuracy as the ones with multiple decoder layers.
QAMem is designed to enhance the discriminative ability of queries on low-resolution feature maps by assigning separate memory values to each query rather than a shared one.
arXiv Detail & Related papers (2021-05-27T13:51:42Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.