Prompt Learning for Multi-Label Code Smell Detection: A Promising
Approach
- URL: http://arxiv.org/abs/2402.10398v1
- Date: Fri, 16 Feb 2024 01:50:46 GMT
- Title: Prompt Learning for Multi-Label Code Smell Detection: A Promising
Approach
- Authors: Haiyang Liu, Yang Zhang, Vidya Saikrishna, Quanquan Tian, Kun Zheng
- Abstract summary: Code smells indicate the potential problems of software quality so that developers can identify opportunities by detecting code smells.
We propose textitPromptSmell, a novel approach based on prompt learning for detecting multi-label code smell.
- Score: 6.74877139507271
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Code smells indicate the potential problems of software quality so that
developers can identify refactoring opportunities by detecting code smells.
State-of-the-art approaches leverage heuristics, machine learning, and deep
learning to detect code smells. However, existing approaches have not fully
explored the potential of large language models (LLMs). In this paper, we
propose \textit{PromptSmell}, a novel approach based on prompt learning for
detecting multi-label code smell. Firstly, code snippets are acquired by
traversing abstract syntax trees. Combined code snippets with natural language
prompts and mask tokens, \textit{PromptSmell} constructs the input of LLMs.
Secondly, to detect multi-label code smell, we leverage a label combination
approach by converting a multi-label problem into a multi-classification
problem. A customized answer space is added to the word list of pre-trained
language models, and the probability distribution of intermediate answers is
obtained by predicting the words at the mask positions. Finally, the
intermediate answers are mapped to the target class labels by a verbalizer as
the final classification result. We evaluate the effectiveness of
\textit{PromptSmell} by answering six research questions. The experimental
results demonstrate that \textit{PromptSmell} obtains an improvement of 11.17\%
in $precision_{w}$ and 7.4\% in $F1_{w}$ compared to existing approaches.
Related papers
- A Novel Taxonomy and Classification Scheme for Code Smell Interactions [2.6597689982591044]
This study presents a novel taxonomy and a proposed classification scheme for the possible code smell interactions.
Experiments have been carried out using several popular machine learning (ML) models.
Results primarily show the presence of code smell interactions namely Inter-smell Detection within domain.
arXiv Detail & Related papers (2025-04-25T16:24:11Z) - Protecting Copyrighted Material with Unique Identifiers in Large Language Model Training [55.321010757641524]
A major public concern regarding the training of large language models (LLMs) is whether they abusing copyrighted online text.
Previous membership inference methods may be misled by similar examples in vast amounts of training data.
We propose an alternative textitinsert-and-detection methodology, advocating that web users and content platforms employ textbftextitunique identifiers.
arXiv Detail & Related papers (2024-03-23T06:36:32Z) - Multi-Label Knowledge Distillation [86.03990467785312]
We propose a novel multi-label knowledge distillation method.
On one hand, it exploits the informative semantic knowledge from the logits by dividing the multi-label learning problem into a set of binary classification problems.
On the other hand, it enhances the distinctiveness of the learned feature representations by leveraging the structural information of label-wise embeddings.
arXiv Detail & Related papers (2023-08-12T03:19:08Z) - TagCLIP: Improving Discrimination Ability of Open-Vocabulary Semantic Segmentation [53.974228542090046]
Contrastive Language-Image Pre-training (CLIP) has recently shown great promise in pixel-level zero-shot learning tasks.
Existing approaches utilizing CLIP's text and patch embeddings to generate semantic masks often misidentify input pixels from unseen classes.
We propose TagCLIP (Trusty-aware guided CLIP) to address this issue.
arXiv Detail & Related papers (2023-04-15T12:52:23Z) - DACOS-A Manually Annotated Dataset of Code Smells [4.753388560240438]
We present DACOS, a manually annotated dataset containing 10,267 annotations for 5,192 code snippets.
The dataset targets three kinds of code smells at different granularity: multifaceted abstraction, complex method, and long parameter list.
We have developed TagMan, a web application to help annotators view and mark the snippets one-by-one and record the provided annotations.
arXiv Detail & Related papers (2023-03-15T16:13:40Z) - Addressing Leakage in Self-Supervised Contextualized Code Retrieval [3.693362838682697]
We address contextualized code retrieval, the search for code snippets helpful to fill gaps in a partial input program.
Our approach facilitates a large-scale self-supervised contrastive training by splitting source code randomly into contexts and targets.
To combat leakage between the two, we suggest a novel approach based on mutual identifier masking, dedentation, and the selection of syntax-aligned targets.
arXiv Detail & Related papers (2022-04-17T12:58:38Z) - Enhancing Semantic Code Search with Multimodal Contrastive Learning and
Soft Data Augmentation [50.14232079160476]
We propose a new approach with multimodal contrastive learning and soft data augmentation for code search.
We conduct extensive experiments to evaluate the effectiveness of our approach on a large-scale dataset with six programming languages.
arXiv Detail & Related papers (2022-04-07T08:49:27Z) - Trustable Co-label Learning from Multiple Noisy Annotators [68.59187658490804]
Supervised deep learning depends on massive accurately annotated examples.
A typical alternative is learning from multiple noisy annotators.
This paper proposes a data-efficient approach, called emphTrustable Co-label Learning (TCL)
arXiv Detail & Related papers (2022-03-08T16:57:00Z) - Pre-trained Token-replaced Detection Model as Few-shot Learner [31.40447168356879]
We propose a novel approach to few-shot learning with pre-trained token-replaced detection models like ELECTRA.
A systematic evaluation on 16 datasets demonstrates that our approach outperforms few-shot learners with pre-trained masked language models.
arXiv Detail & Related papers (2022-03-07T09:47:53Z) - CLSEBERT: Contrastive Learning for Syntax Enhanced Code Pre-Trained
Model [23.947178895479464]
We propose CLSEBERT, a Constrastive Learning Framework for Syntax Enhanced Code Pre-Trained Model.
In the pre-training stage, we consider the code syntax and hierarchy contained in the Abstract Syntax Tree (AST)
We also introduce two novel pre-training objectives. One is to predict the edges between nodes in the abstract syntax tree, and the other is to predict the types of code tokens.
arXiv Detail & Related papers (2021-08-10T10:08:21Z) - Label Mask for Multi-Label Text Classification [6.742627397194543]
We propose a Label Mask multi-label text classification model (LM-MTC), which is inspired by the idea of cloze questions of language model.
On the basis, we assign a different token to each potential label, and randomly mask the token with a certain probability to build a label based Masked Language Model (MLM)
arXiv Detail & Related papers (2021-06-18T11:54:33Z) - LabelEnc: A New Intermediate Supervision Method for Object Detection [78.74368141062797]
We propose a new intermediate supervision method, named LabelEnc, to boost the training of object detection systems.
The key idea is to introduce a novel label encoding function, mapping the ground-truth labels into latent embedding.
Experiments show our method improves a variety of detection systems by around 2% on COCO dataset.
arXiv Detail & Related papers (2020-07-07T08:55:05Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.