Related papers: Toward the Automatic Classification of Self-Affirmed Refactoring

Toward the Automatic Classification of Self-Affirmed Refactoring

URL: http://arxiv.org/abs/2009.09279v1
Date: Sat, 19 Sep 2020 18:35:21 GMT
Title: Toward the Automatic Classification of Self-Affirmed Refactoring
Authors: Eman Abdullah AlOmar, Mohamed Wiem Mkaouer, Ali Ouni
Abstract summary: Self-Affirmed Refactoring (SAR) was introduced to explore how developers document their activities in commit messages. We propose a two-step approach to first identify whether a commit describes developer-related events, then to classify it according to the common quality improvement categories. Our model is able to accurately classify commits, outperforming the pattern-based random approaches, and allowing the discovery of 40 more relevant SAR patterns.
Score: 22.27416971215152
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: The concept of Self-Affirmed Refactoring (SAR) was introduced to explore how developers document their refactoring activities in commit messages, i.e., developers' explicit documentation of refactoring operations intentionally introduced during a code change. In our previous study, we have manually identified refactoring patterns and defined three main common quality improvement categories, including internal quality attributes, external quality attributes, and code smells, by only considering refactoring-related commits. However, this approach heavily depends on the manual inspection of commit messages. In this paper, we propose a two-step approach to first identify whether a commit describes developer-related refactoring events, then to classify it according to the refactoring common quality improvement categories. Specifically, we combine the N-Gram TF-IDF feature selection with binary and multiclass classifiers to build a new model to automate the classification of refactorings based on their quality improvement categories. We challenge our model using a total of 2,867 commit messages extracted from well-engineered open-source Java projects. Our findings show that (1) our model is able to accurately classify SAR commits, outperforming the pattern-based and random classifier approaches, and allowing the discovery of 40 more relevant SAR patterns, and (2) our model reaches an F-measure of up to 90% even with a relatively small training dataset.

Related papers

Retrieval-Augmented Visual Question Answering via Built-in Autoregressive Search Engines [17.803396998387665]
Retrieval-augmented generation (RAG) has emerged to address the knowledge-intensive visual question answering (VQA) task. We propose ReAuSE, an alternative to the previous RAG model for the knowledge-based VQA task. Our model functions both as a generative retriever and an accurate answer generator.
arXiv Detail & Related papers (2025-02-23T16:39:39Z)
An Empirical Study on the Impact of Code Duplication-aware Refactoring Practices on Quality Metrics [5.516979718589074]
We extract a corpus of 332 commits applied and documented by developers during their daily changes from 128 open-source Java projects. We empirically analyze the impact of these operations on a set of common state-of-the-art design quality metrics.
arXiv Detail & Related papers (2025-02-06T13:34:25Z)
Are Large Language Models Good Classifiers? A Study on Edit Intent Classification in Scientific Document Revisions [62.12545440385489]
Large language models (LLMs) have brought substantial advancements in text generation, but their potential for enhancing classification tasks remains underexplored. We propose a framework for thoroughly investigating fine-tuning LLMs for classification, including both generation- and encoding-based approaches. We instantiate this framework in edit intent classification (EIC), a challenging and underexplored classification task.
arXiv Detail & Related papers (2024-10-02T20:48:28Z)
Context-Enhanced LLM-Based Framework for Automatic Test Refactoring [10.847400457238423]
Test smells arise from poor design practices and insufficient domain knowledge. We propose UTRefactor, a context-enhanced, LLM-based framework for automatic test in Java projects. We evaluate UTRefactor on 879 tests from six open-source Java projects, reducing the number of test smells from 2,375 to 265, achieving an 89% reduction.
arXiv Detail & Related papers (2024-09-25T08:42:29Z)
Large Language Model-guided Document Selection [23.673690115025913]
Large Language Model (LLM) pre-training exhausts an ever growing compute budget. Recent research has demonstrated that careful document selection enables comparable model quality with only a fraction of the FLOPs. We explore a promising direction for scalable general-domain document selection.
arXiv Detail & Related papers (2024-06-07T04:52:46Z)
Evaluating Generative Language Models in Information Extraction as Subjective Question Correction [49.729908337372436]
We propose a new evaluation method, SQC-Score. Inspired by the principles in subjective question correction, we propose a new evaluation method, SQC-Score. Results on three information extraction tasks show that SQC-Score is more preferred by human annotators than the baseline metrics.
arXiv Detail & Related papers (2024-04-04T15:36:53Z)
Generative Multi-modal Models are Good Class-Incremental Learners [51.5648732517187]
We propose a novel generative multi-modal model (GMM) framework for class-incremental learning. Our approach directly generates labels for images using an adapted generative model. Under the Few-shot CIL setting, we have improved by at least 14% accuracy over all the current state-of-the-art methods with significantly less forgetting.
arXiv Detail & Related papers (2024-03-27T09:21:07Z)
State of Refactoring Adoption: Better Understanding Developer Perception of Refactoring [5.516979718589074]
We aim to explore how developers document their activities during the software life cycle. We call such activity Self-Affirmed Refactoring (SAR), which indicates developers' documentation of their activities. We propose an approach to identify whether a commit describes developer-related events to classify them according to the common quality improvement categories.
arXiv Detail & Related papers (2023-06-09T16:38:20Z)
RefBERT: A Two-Stage Pre-trained Framework for Automatic Rename Refactoring [57.8069006460087]
We study automatic rename on variable names, which is considered more challenging than other rename activities. We propose RefBERT, a two-stage pre-trained framework for rename on variable names. We show that the generated variable names of RefBERT are more accurate and meaningful than those produced by the existing method.
arXiv Detail & Related papers (2023-05-28T12:29:39Z)
Self-supervised Detransformation Autoencoder for Representation Learning in Open Set Recognition [0.0]
We propose a self-supervision method, Detransformation Autoencoder (DTAE) for the Open set recognition problem. Our proposed self-supervision method achieves significant gains in detecting the unknown class and classifying the known classes.
arXiv Detail & Related papers (2021-05-28T02:45:57Z)
Learning and Evaluating Representations for Deep One-class Classification [59.095144932794646]
We present a two-stage framework for deep one-class classification. We first learn self-supervised representations from one-class data, and then build one-class classifiers on learned representations. In experiments, we demonstrate state-of-the-art performance on visual domain one-class classification benchmarks.
arXiv Detail & Related papers (2020-11-04T23:33:41Z)
How We Refactor and How We Document it? On the Use of Supervised Machine Learning Algorithms to Classify Refactoring Documentation [25.626914797750487]
Refactoring is the art of improving the design of a system without altering its external behavior. This study categorizes commits into 3 categories, namely, Internal QA, External QA, and Code Smell Resolution, along with the traditional BugFix and Functional categories. To better understand our classification results, we analyzed commit messages to extract patterns that developers regularly use to describe their smells.
arXiv Detail & Related papers (2020-10-26T20:33:17Z)
Open-Set Recognition with Gaussian Mixture Variational Autoencoders [91.3247063132127]
In inference, open-set classification is to either classify a sample into a known class from training or reject it as an unknown class. We train our model to cooperatively learn reconstruction and perform class-based clustering in the latent space. Our model achieves more accurate and robust open-set classification results, with an average F1 improvement of 29.5%.
arXiv Detail & Related papers (2020-06-03T01:15:19Z)

This list is automatically generated from the titles and abstracts of the papers in this site.