On the Impact of Multiple Source Code Representations on Software
Engineering Tasks -- An Empirical Study
- URL: http://arxiv.org/abs/2106.10918v5
- Date: Sun, 24 Dec 2023 17:24:51 GMT
- Title: On the Impact of Multiple Source Code Representations on Software
Engineering Tasks -- An Empirical Study
- Authors: Karthik Chandra Swarna, Noble Saji Mathews, Dheeraj Vagavolu, Sridhar
Chimalakonda
- Abstract summary: We modify an AST path-based approach to accept multiple representations as input to an attention-based model.
We evaluate our approach on three tasks: Method Naming, Program Classification, and Clone Detection.
- Score: 4.049850026698639
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: Efficiently representing source code is crucial for various software
engineering tasks such as code classification and clone detection. Existing
approaches primarily use Abstract Syntax Tree (AST), and only a few focus on
semantic graphs such as Control Flow Graph (CFG) and Program Dependency Graph
(PDG), which contain information about source code that AST does not. Even
though some works tried to utilize multiple representations, they do not
provide any insights about the costs and benefits of using multiple
representations. The primary goal of this paper is to discuss the implications
of utilizing multiple code representations, specifically AST, CFG, and PDG. We
modify an AST path-based approach to accept multiple representations as input
to an attention-based model. We do this to measure the impact of additional
representations (such as CFG and PDG) over AST. We evaluate our approach on
three tasks: Method Naming, Program Classification, and Clone Detection. Our
approach increases the performance on these tasks by 11% (F1), 15.7%
(Accuracy), and 9.3% (F1), respectively, over the baseline. In addition to the
effect on performance, we discuss timing overheads incurred with multiple
representations. We envision this work providing researchers with a lens to
evaluate combinations of code representations for various tasks.
Related papers
- FastGAS: Fast Graph-based Annotation Selection for In-Context Learning [53.17606395275021]
In-context learning (ICL) empowers large language models (LLMs) to tackle new tasks by using a series of training instances as prompts.
Existing methods have proposed to select a subset of unlabeled examples for annotation.
We propose a graph-based selection method, FastGAS, designed to efficiently identify high-quality instances.
arXiv Detail & Related papers (2024-06-06T04:05:54Z) - Abstract Syntax Tree for Programming Language Understanding and
Representation: How Far Are We? [23.52632194060246]
Programming language understanding and representation (a.k.a code representation learning) has always been a hot and challenging task in software engineering.
The abstract syntax tree (AST), a fundamental code feature, illustrates the syntactic information of the source code and has been widely used in code representation learning.
We compare the performance of models trained with code token sequence (Token for short) based code representation and AST-based code representation on three popular types of code-related tasks.
arXiv Detail & Related papers (2023-12-01T08:37:27Z) - Feature Decoupling-Recycling Network for Fast Interactive Segmentation [79.22497777645806]
Recent interactive segmentation methods iteratively take source image, user guidance and previously predicted mask as the input.
We propose the Feature Decoupling-Recycling Network (FDRN), which decouples the modeling components based on their intrinsic discrepancies.
arXiv Detail & Related papers (2023-08-07T12:26:34Z) - A Unified Active Learning Framework for Annotating Graph Data with
Application to Software Source Code Performance Prediction [4.572330678291241]
We develop a unified active learning framework specializing in software performance prediction.
We investigate the impact of using different levels of information for active and passive learning.
Our approach aims to improve the investment in AI models for different software performance predictions.
arXiv Detail & Related papers (2023-04-06T14:00:48Z) - Good Visual Guidance Makes A Better Extractor: Hierarchical Visual
Prefix for Multimodal Entity and Relation Extraction [88.6585431949086]
We propose a novel Hierarchical Visual Prefix fusion NeTwork (HVPNeT) for visual-enhanced entity and relation extraction.
We regard visual representation as pluggable visual prefix to guide the textual representation for error insensitive forecasting decision.
Experiments on three benchmark datasets demonstrate the effectiveness of our method, and achieve state-of-the-art performance.
arXiv Detail & Related papers (2022-05-07T02:10:55Z) - Enhancing Semantic Code Search with Multimodal Contrastive Learning and
Soft Data Augmentation [50.14232079160476]
We propose a new approach with multimodal contrastive learning and soft data augmentation for code search.
We conduct extensive experiments to evaluate the effectiveness of our approach on a large-scale dataset with six programming languages.
arXiv Detail & Related papers (2022-04-07T08:49:27Z) - Learning Program Semantics with Code Representations: An Empirical Study [22.953964699210296]
Program semantics learning is the core and fundamental for various code intelligent tasks.
We categorize current mainstream code representation techniques into four categories.
We evaluate its performance on three diverse and popular code intelligent tasks.
arXiv Detail & Related papers (2022-03-22T14:51:44Z) - Multi-View Graph Representation for Programming Language Processing: An
Investigation into Algorithm Detection [35.81014952109471]
This paper proposes a multi-view graph (MVG) program representation method.
MVG pays more attention to code semantics and simultaneously includes both data flow and control flow as multiple views.
In experiments, MVG outperforms previous methods significantly.
arXiv Detail & Related papers (2022-02-25T03:35:45Z) - GN-Transformer: Fusing Sequence and Graph Representation for Improved
Code Summarization [0.0]
We propose a novel method, GN-Transformer, to learn end-to-end on a fused sequence and graph modality.
The proposed methods achieve state-of-the-art performance in two code summarization datasets and across three automatic code summarization metrics.
arXiv Detail & Related papers (2021-11-17T02:51:37Z) - Visual Transformer for Task-aware Active Learning [49.903358393660724]
We present a novel pipeline for pool-based Active Learning.
Our method exploits accessible unlabelled examples during training to estimate their co-relation with the labelled examples.
Visual Transformer models non-local visual concept dependency between labelled and unlabelled examples.
arXiv Detail & Related papers (2021-06-07T17:13:59Z) - A Transformer-based Approach for Source Code Summarization [86.08359401867577]
We learn code representation for summarization by modeling the pairwise relationship between code tokens.
We show that despite the approach is simple, it outperforms the state-of-the-art techniques by a significant margin.
arXiv Detail & Related papers (2020-05-01T23:29:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.