RefBERT: A Two-Stage Pre-trained Framework for Automatic Rename
Refactoring
- URL: http://arxiv.org/abs/2305.17708v1
- Date: Sun, 28 May 2023 12:29:39 GMT
- Title: RefBERT: A Two-Stage Pre-trained Framework for Automatic Rename
Refactoring
- Authors: Hao Liu, Yanlin Wang, Zhao Wei, Yong Xu, Juhong Wang, Hui Li, Rongrong
Ji
- Abstract summary: We study automatic rename on variable names, which is considered more challenging than other rename activities.
We propose RefBERT, a two-stage pre-trained framework for rename on variable names.
We show that the generated variable names of RefBERT are more accurate and meaningful than those produced by the existing method.
- Score: 57.8069006460087
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Refactoring is an indispensable practice of improving the quality and
maintainability of source code in software evolution. Rename refactoring is the
most frequently performed refactoring that suggests a new name for an
identifier to enhance readability when the identifier is poorly named. However,
most existing works only identify renaming activities between two versions of
source code, while few works express concern about how to suggest a new name.
In this paper, we study automatic rename refactoring on variable names, which
is considered more challenging than other rename refactoring activities. We
first point out the connections between rename refactoring and various
prevalent learning paradigms and the difference between rename refactoring and
general text generation in natural language processing. Based on our
observations, we propose RefBERT, a two-stage pre-trained framework for rename
refactoring on variable names. RefBERT first predicts the number of sub-tokens
in the new name and then generates sub-tokens accordingly. Several techniques,
including constrained masked language modeling, contrastive learning, and the
bag-of-tokens loss, are incorporated into RefBERT to tailor it for automatic
rename refactoring on variable names. Through extensive experiments on our
constructed refactoring datasets, we show that the generated variable names of
RefBERT are more accurate and meaningful than those produced by the existing
method.
Related papers
- Context-Enhanced LLM-Based Framework for Automatic Test Refactoring [10.847400457238423]
Test smells arise from poor design practices and insufficient domain knowledge.
We propose UTRefactor, a context-enhanced, LLM-based framework for automatic test in Java projects.
We evaluate UTRefactor on 879 tests from six open-source Java projects, reducing the number of test smells from 2,375 to 265, achieving an 89% reduction.
arXiv Detail & Related papers (2024-09-25T08:42:29Z) - ReGAL: Refactoring Programs to Discover Generalizable Abstractions [59.05769810380928]
Generalizable Abstraction Learning (ReGAL) is a method for learning a library of reusable functions via codeization.
We find that the shared function libraries discovered by ReGAL make programs easier to predict across diverse domains.
For CodeLlama-13B, ReGAL results in absolute accuracy increases of 11.5% on LOGO, 26.1% on date understanding, and 8.1% on TextCraft, outperforming GPT-3.5 in two of three domains.
arXiv Detail & Related papers (2024-01-29T18:45:30Z) - RefSearch: A Search Engine for Refactoring [1.5519338281670214]
RefSearch enables users to search for cases through a user-friendly query language.
System collects instances using two detectors and provides a web interface for querying and browsing the cases.
arXiv Detail & Related papers (2023-08-28T03:04:47Z) - Software refactoring and rewriting: from the perspective of code
transformations [0.0]
We can borrow ideas from micropass/nanopass compilers.
By treating the procedure of software as composing code, we can often obtain representations of processes short enough that their correctness can be analysed manually.
arXiv Detail & Related papers (2023-08-12T17:11:54Z) - How Fragile is Relation Extraction under Entity Replacements? [70.34001923252711]
Relation extraction (RE) aims to extract the relations between entity names from the textual context.
Existing work has found that the RE models the entity name patterns to make RE predictions while ignoring the textual context.
This motivates us to raise the question: are RE models robust to the entity replacements?''
arXiv Detail & Related papers (2023-05-22T23:53:32Z) - Do code refactorings influence the merge effort? [80.1936417993664]
Multiple contributors frequently change the source code in parallel to implement new features, fix bugs, existing code, and make other changes.
These simultaneous changes need to be merged into the same version of the source code.
Studies show that 10 to 20 percent of all merge attempts result in conflicts, which require the manual developer's intervention to complete the process.
arXiv Detail & Related papers (2023-05-10T13:24:59Z) - VarCLR: Variable Semantic Representation Pre-training via Contrastive
Learning [84.70916463298109]
VarCLR is a new approach for learning semantic representations of variable names.
VarCLR is an excellent fit for contrastive learning, which aims to minimize the distance between explicitly similar inputs.
We show that VarCLR enables the effective application of sophisticated, general-purpose language models like BERT.
arXiv Detail & Related papers (2021-12-05T18:40:32Z) - How We Refactor and How We Document it? On the Use of Supervised Machine
Learning Algorithms to Classify Refactoring Documentation [25.626914797750487]
Refactoring is the art of improving the design of a system without altering its external behavior.
This study categorizes commits into 3 categories, namely, Internal QA, External QA, and Code Smell Resolution, along with the traditional BugFix and Functional categories.
To better understand our classification results, we analyzed commit messages to extract patterns that developers regularly use to describe their smells.
arXiv Detail & Related papers (2020-10-26T20:33:17Z) - Toward the Automatic Classification of Self-Affirmed Refactoring [22.27416971215152]
Self-Affirmed Refactoring (SAR) was introduced to explore how developers document their activities in commit messages.
We propose a two-step approach to first identify whether a commit describes developer-related events, then to classify it according to the common quality improvement categories.
Our model is able to accurately classify commits, outperforming the pattern-based random approaches, and allowing the discovery of 40 more relevant SAR patterns.
arXiv Detail & Related papers (2020-09-19T18:35:21Z) - Empower Entity Set Expansion via Language Model Probing [58.78909391545238]
Existing set expansion methods bootstrap the seed entity set by adaptively selecting context features and extracting new entities.
A key challenge for entity set expansion is to avoid selecting ambiguous context features which will shift the class semantics and lead to accumulative errors in later iterations.
We propose a novel iterative set expansion framework that leverages automatically generated class names to address the semantic drift issue.
arXiv Detail & Related papers (2020-04-29T00:09:43Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.