Toward the Automatic Classification of Self-Affirmed Refactoring
- URL: http://arxiv.org/abs/2009.09279v1
- Date: Sat, 19 Sep 2020 18:35:21 GMT
- Title: Toward the Automatic Classification of Self-Affirmed Refactoring
- Authors: Eman Abdullah AlOmar, Mohamed Wiem Mkaouer, Ali Ouni
- Abstract summary: Self-Affirmed Refactoring (SAR) was introduced to explore how developers document their activities in commit messages.
We propose a two-step approach to first identify whether a commit describes developer-related events, then to classify it according to the common quality improvement categories.
Our model is able to accurately classify commits, outperforming the pattern-based random approaches, and allowing the discovery of 40 more relevant SAR patterns.
- Score: 22.27416971215152
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The concept of Self-Affirmed Refactoring (SAR) was introduced to explore how
developers document their refactoring activities in commit messages, i.e.,
developers' explicit documentation of refactoring operations intentionally
introduced during a code change. In our previous study, we have manually
identified refactoring patterns and defined three main common quality
improvement categories, including internal quality attributes, external quality
attributes, and code smells, by only considering refactoring-related commits.
However, this approach heavily depends on the manual inspection of commit
messages. In this paper, we propose a two-step approach to first identify
whether a commit describes developer-related refactoring events, then to
classify it according to the refactoring common quality improvement categories.
Specifically, we combine the N-Gram TF-IDF feature selection with binary and
multiclass classifiers to build a new model to automate the classification of
refactorings based on their quality improvement categories. We challenge our
model using a total of 2,867 commit messages extracted from well-engineered
open-source Java projects. Our findings show that (1) our model is able to
accurately classify SAR commits, outperforming the pattern-based and random
classifier approaches, and allowing the discovery of 40 more relevant SAR
patterns, and (2) our model reaches an F-measure of up to 90% even with a
relatively small training dataset.
Related papers
- Are Large Language Models Good Classifiers? A Study on Edit Intent Classification in Scientific Document Revisions [62.12545440385489]
Large language models (LLMs) have brought substantial advancements in text generation, but their potential for enhancing classification tasks remains underexplored.
We propose a framework for thoroughly investigating fine-tuning LLMs for classification, including both generation- and encoding-based approaches.
We instantiate this framework in edit intent classification (EIC), a challenging and underexplored classification task.
arXiv Detail & Related papers (2024-10-02T20:48:28Z) - Context-Enhanced LLM-Based Framework for Automatic Test Refactoring [10.847400457238423]
Test smells arise from poor design practices and insufficient domain knowledge.
We propose UTRefactor, a context-enhanced, LLM-based framework for automatic test in Java projects.
We evaluate UTRefactor on 879 tests from six open-source Java projects, reducing the number of test smells from 2,375 to 265, achieving an 89% reduction.
arXiv Detail & Related papers (2024-09-25T08:42:29Z) - Large Language Model-guided Document Selection [23.673690115025913]
Large Language Model (LLM) pre-training exhausts an ever growing compute budget.
Recent research has demonstrated that careful document selection enables comparable model quality with only a fraction of the FLOPs.
We explore a promising direction for scalable general-domain document selection.
arXiv Detail & Related papers (2024-06-07T04:52:46Z) - Evaluating Generative Language Models in Information Extraction as Subjective Question Correction [49.729908337372436]
We propose a new evaluation method, SQC-Score.
Inspired by the principles in subjective question correction, we propose a new evaluation method, SQC-Score.
Results on three information extraction tasks show that SQC-Score is more preferred by human annotators than the baseline metrics.
arXiv Detail & Related papers (2024-04-04T15:36:53Z) - Generative Multi-modal Models are Good Class-Incremental Learners [51.5648732517187]
We propose a novel generative multi-modal model (GMM) framework for class-incremental learning.
Our approach directly generates labels for images using an adapted generative model.
Under the Few-shot CIL setting, we have improved by at least 14% accuracy over all the current state-of-the-art methods with significantly less forgetting.
arXiv Detail & Related papers (2024-03-27T09:21:07Z) - State of Refactoring Adoption: Better Understanding Developer Perception
of Refactoring [5.516979718589074]
We aim to explore how developers document their activities during the software life cycle.
We call such activity Self-Affirmed Refactoring (SAR), which indicates developers' documentation of their activities.
We propose an approach to identify whether a commit describes developer-related events to classify them according to the common quality improvement categories.
arXiv Detail & Related papers (2023-06-09T16:38:20Z) - RefBERT: A Two-Stage Pre-trained Framework for Automatic Rename
Refactoring [57.8069006460087]
We study automatic rename on variable names, which is considered more challenging than other rename activities.
We propose RefBERT, a two-stage pre-trained framework for rename on variable names.
We show that the generated variable names of RefBERT are more accurate and meaningful than those produced by the existing method.
arXiv Detail & Related papers (2023-05-28T12:29:39Z) - Self-supervised Detransformation Autoencoder for Representation Learning
in Open Set Recognition [0.0]
We propose a self-supervision method, Detransformation Autoencoder (DTAE) for the Open set recognition problem.
Our proposed self-supervision method achieves significant gains in detecting the unknown class and classifying the known classes.
arXiv Detail & Related papers (2021-05-28T02:45:57Z) - Learning and Evaluating Representations for Deep One-class
Classification [59.095144932794646]
We present a two-stage framework for deep one-class classification.
We first learn self-supervised representations from one-class data, and then build one-class classifiers on learned representations.
In experiments, we demonstrate state-of-the-art performance on visual domain one-class classification benchmarks.
arXiv Detail & Related papers (2020-11-04T23:33:41Z) - How We Refactor and How We Document it? On the Use of Supervised Machine
Learning Algorithms to Classify Refactoring Documentation [25.626914797750487]
Refactoring is the art of improving the design of a system without altering its external behavior.
This study categorizes commits into 3 categories, namely, Internal QA, External QA, and Code Smell Resolution, along with the traditional BugFix and Functional categories.
To better understand our classification results, we analyzed commit messages to extract patterns that developers regularly use to describe their smells.
arXiv Detail & Related papers (2020-10-26T20:33:17Z) - Open-Set Recognition with Gaussian Mixture Variational Autoencoders [91.3247063132127]
In inference, open-set classification is to either classify a sample into a known class from training or reject it as an unknown class.
We train our model to cooperatively learn reconstruction and perform class-based clustering in the latent space.
Our model achieves more accurate and robust open-set classification results, with an average F1 improvement of 29.5%.
arXiv Detail & Related papers (2020-06-03T01:15:19Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.