Related papers: RefBERT: A Two-Stage Pre-trained Framework for Automatic Rename Refactoring

RefBERT: A Two-Stage Pre-trained Framework for Automatic Rename Refactoring

URL: http://arxiv.org/abs/2305.17708v1
Date: Sun, 28 May 2023 12:29:39 GMT
Title: RefBERT: A Two-Stage Pre-trained Framework for Automatic Rename Refactoring
Authors: Hao Liu, Yanlin Wang, Zhao Wei, Yong Xu, Juhong Wang, Hui Li, Rongrong Ji
Abstract summary: We study automatic rename on variable names, which is considered more challenging than other rename activities. We propose RefBERT, a two-stage pre-trained framework for rename on variable names. We show that the generated variable names of RefBERT are more accurate and meaningful than those produced by the existing method.
Score: 57.8069006460087
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Refactoring is an indispensable practice of improving the quality and maintainability of source code in software evolution. Rename refactoring is the most frequently performed refactoring that suggests a new name for an identifier to enhance readability when the identifier is poorly named. However, most existing works only identify renaming activities between two versions of source code, while few works express concern about how to suggest a new name. In this paper, we study automatic rename refactoring on variable names, which is considered more challenging than other rename refactoring activities. We first point out the connections between rename refactoring and various prevalent learning paradigms and the difference between rename refactoring and general text generation in natural language processing. Based on our observations, we propose RefBERT, a two-stage pre-trained framework for rename refactoring on variable names. RefBERT first predicts the number of sub-tokens in the new name and then generates sub-tokens accordingly. Several techniques, including constrained masked language modeling, contrastive learning, and the bag-of-tokens loss, are incorporated into RefBERT to tailor it for automatic rename refactoring on variable names. Through extensive experiments on our constructed refactoring datasets, we show that the generated variable names of RefBERT are more accurate and meaningful than those produced by the existing method.

Related papers

Refactoring Detection in C++ Programs with RefactoringMiner++ [45.045206894182776]
We present RefactoringMiner++, a detection tool based on the current state of the art: RefactoringMiner 3. While the latter focuses exclusively on Java, our tool is seeded -- to the best of our knowledge -- the first publicly available detection tool for C++ projects.
arXiv Detail & Related papers (2025-02-24T23:17:35Z)
ReF Decompile: Relabeling and Function Call Enhanced Decompile [50.86228893636785]
The goal of decompilation is to convert compiled low-level code (e.g., assembly code) back into high-level programming languages. This task supports various reverse engineering applications, such as vulnerability identification, malware analysis, and legacy software migration.
arXiv Detail & Related papers (2025-02-17T12:38:57Z)
Testing Refactoring Engine via Historical Bug Report driven LLM [6.852749659993347]
Refactoring is the process of restructuring existing code without changing its external behavior. We propose RETESTER, a framework for automated engine testing.
arXiv Detail & Related papers (2025-01-16T23:31:49Z)
Context-Enhanced LLM-Based Framework for Automatic Test Refactoring [10.847400457238423]
Test smells arise from poor design practices and insufficient domain knowledge. We propose UTRefactor, a context-enhanced, LLM-based framework for automatic test in Java projects. We evaluate UTRefactor on 879 tests from six open-source Java projects, reducing the number of test smells from 2,375 to 265, achieving an 89% reduction.
arXiv Detail & Related papers (2024-09-25T08:42:29Z)
ReGAL: Refactoring Programs to Discover Generalizable Abstractions [59.05769810380928]
Generalizable Abstraction Learning (ReGAL) is a method for learning a library of reusable functions via codeization. We find that the shared function libraries discovered by ReGAL make programs easier to predict across diverse domains. For CodeLlama-13B, ReGAL results in absolute accuracy increases of 11.5% on LOGO, 26.1% on date understanding, and 8.1% on TextCraft, outperforming GPT-3.5 in two of three domains.
arXiv Detail & Related papers (2024-01-29T18:45:30Z)
RefSearch: A Search Engine for Refactoring [1.5519338281670214]
RefSearch enables users to search for cases through a user-friendly query language. System collects instances using two detectors and provides a web interface for querying and browsing the cases.
arXiv Detail & Related papers (2023-08-28T03:04:47Z)
Software refactoring and rewriting: from the perspective of code transformations [0.0]
We can borrow ideas from micropass/nanopass compilers. By treating the procedure of software as composing code, we can often obtain representations of processes short enough that their correctness can be analysed manually.
arXiv Detail & Related papers (2023-08-12T17:11:54Z)
How Fragile is Relation Extraction under Entity Replacements? [70.34001923252711]
Relation extraction (RE) aims to extract the relations between entity names from the textual context. Existing work has found that the RE models the entity name patterns to make RE predictions while ignoring the textual context. This motivates us to raise the question: are RE models robust to the entity replacements?''
arXiv Detail & Related papers (2023-05-22T23:53:32Z)
Do code refactorings influence the merge effort? [80.1936417993664]
Multiple contributors frequently change the source code in parallel to implement new features, fix bugs, existing code, and make other changes. These simultaneous changes need to be merged into the same version of the source code. Studies show that 10 to 20 percent of all merge attempts result in conflicts, which require the manual developer's intervention to complete the process.
arXiv Detail & Related papers (2023-05-10T13:24:59Z)
VarCLR: Variable Semantic Representation Pre-training via Contrastive Learning [84.70916463298109]
VarCLR is a new approach for learning semantic representations of variable names. VarCLR is an excellent fit for contrastive learning, which aims to minimize the distance between explicitly similar inputs. We show that VarCLR enables the effective application of sophisticated, general-purpose language models like BERT.
arXiv Detail & Related papers (2021-12-05T18:40:32Z)
How We Refactor and How We Document it? On the Use of Supervised Machine Learning Algorithms to Classify Refactoring Documentation [25.626914797750487]
Refactoring is the art of improving the design of a system without altering its external behavior. This study categorizes commits into 3 categories, namely, Internal QA, External QA, and Code Smell Resolution, along with the traditional BugFix and Functional categories. To better understand our classification results, we analyzed commit messages to extract patterns that developers regularly use to describe their smells.
arXiv Detail & Related papers (2020-10-26T20:33:17Z)
Toward the Automatic Classification of Self-Affirmed Refactoring [22.27416971215152]
Self-Affirmed Refactoring (SAR) was introduced to explore how developers document their activities in commit messages. We propose a two-step approach to first identify whether a commit describes developer-related events, then to classify it according to the common quality improvement categories. Our model is able to accurately classify commits, outperforming the pattern-based random approaches, and allowing the discovery of 40 more relevant SAR patterns.
arXiv Detail & Related papers (2020-09-19T18:35:21Z)
Empower Entity Set Expansion via Language Model Probing [58.78909391545238]
Existing set expansion methods bootstrap the seed entity set by adaptively selecting context features and extracting new entities. A key challenge for entity set expansion is to avoid selecting ambiguous context features which will shift the class semantics and lead to accumulative errors in later iterations. We propose a novel iterative set expansion framework that leverages automatically generated class names to address the semantic drift issue.
arXiv Detail & Related papers (2020-04-29T00:09:43Z)

This list is automatically generated from the titles and abstracts of the papers in this site.