Related papers: The Impact of Using Regression Models to Build Defect Classifiers

The Impact of Using Regression Models to Build Defect Classifiers

URL: http://arxiv.org/abs/2202.06157v1
Date: Sat, 12 Feb 2022 22:12:55 GMT
Title: The Impact of Using Regression Models to Build Defect Classifiers
Authors: Gopi Krishnan Rajbahadur, Shaowei Wang, Yasutaka Kamei, Ahmed E. Hassan
Abstract summary: It is common practice to discretize continuous defect counts into defective and non-defective classes. We compare the performance and interpretation of defect classifiers built using both approaches.
Score: 13.840006058766766
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: It is common practice to discretize continuous defect counts into defective and non-defective classes and use them as a target variable when building defect classifiers (discretized classifiers). However, this discretization of continuous defect counts leads to information loss that might affect the performance and interpretation of defect classifiers. Another possible approach to build defect classifiers is through the use of regression models then discretizing the predicted defect counts into defective and non-defective classes (regression-based classifiers). In this paper, we compare the performance and interpretation of defect classifiers that are built using both approaches (i.e., discretized classifiers and regression-based classifiers) across six commonly used machine learning classifiers (i.e., linear/logistic regression, random forest, KNN, SVM, CART, and neural networks) and 17 datasets. We find that: i) Random forest based classifiers outperform other classifiers (best AUC) for both classifier building approaches; ii) In contrast to common practice, building a defect classifier using discretized defect counts (i.e., discretized classifiers) does not always lead to better performance. Hence we suggest that future defect classification studies should consider building regression-based classifiers (in particular when the defective ratio of the modeled dataset is low). Moreover, we suggest that both approaches for building defect classifiers should be explored, so the best-performing classifier can be used when determining the most influential features.

Related papers

Fixed Random Classifier Rearrangement for Continual Learning [0.5439020425819]
In visual classification scenario, neural networks inevitably forget the knowledge of old tasks after learning new ones. We propose a continual learning algorithm named Fixed Random Rearrangement (FRCR)
arXiv Detail & Related papers (2024-02-23T09:43:58Z)
Deep Imbalanced Regression via Hierarchical Classification Adjustment [50.19438850112964]
Regression tasks in computer vision are often formulated into classification by quantizing the target space into classes. The majority of training samples lie in a head range of target values, while a minority of samples span a usually larger tail range. We propose to construct hierarchical classifiers for solving imbalanced regression tasks. Our novel hierarchical classification adjustment (HCA) for imbalanced regression shows superior results on three diverse tasks.
arXiv Detail & Related papers (2023-10-26T04:54:39Z)
Anomaly Detection using Ensemble Classification and Evidence Theory [62.997667081978825]
We present a novel approach for novel detection using ensemble classification and evidence theory. A pool selection strategy is presented to build a solid ensemble classifier. We use uncertainty for the anomaly detection approach.
arXiv Detail & Related papers (2022-12-23T00:50:41Z)
Generative Robust Classification [3.4773470589069477]
Training adversarially robust discriminative (i.e., softmax) classification has been the dominant approach to robust classification. We investigate using adversarial training (AT)-based generative models. We find it straightforward to apply advanced data augmentation to achieve better robustness in our approach.
arXiv Detail & Related papers (2022-12-14T15:33:11Z)
On The Effectiveness of One-Class Support Vector Machine in Different Defect Prediction Scenarios [7.592094566354553]
Defect prediction aims at identifying software components that are likely to cause faults before a software is made available to the end-user. Previous studies show that One-Class Support Vector Machine (OCSVM) can outperform two-class classifiers for within-project defect prediction. We investigate whether learning from one class only is sufficient to produce effective defect prediction model in two other different scenarios.
arXiv Detail & Related papers (2022-02-24T12:57:14Z)
Is the Performance of My Deep Network Too Good to Be True? A Direct Approach to Estimating the Bayes Error in Binary Classification [86.32752788233913]
In classification problems, the Bayes error can be used as a criterion to evaluate classifiers with state-of-the-art performance. We propose a simple and direct Bayes error estimator, where we just take the mean of the labels that show emphuncertainty of the classes. Our flexible approach enables us to perform Bayes error estimation even for weakly supervised data.
arXiv Detail & Related papers (2022-02-01T13:22:26Z)
Robustifying Binary Classification to Adversarial Perturbation [45.347651499585055]
In this paper we consider the problem of binary classification with adversarial perturbations. We introduce a generalization to the max-margin classifier which takes into account the power of the adversary in manipulating the data. Under some mild assumptions on the loss function, we theoretically show that the gradient descents converge to the RM classifier in its direction.
arXiv Detail & Related papers (2020-10-29T07:20:37Z)
Predicting Classification Accuracy When Adding New Unobserved Classes [8.325327265120283]
We study how a classifier's performance can be used to extrapolate its expected accuracy on a larger, unobserved set of classes. We formulate a robust neural-network-based algorithm, "CleaneX", which learns to estimate the accuracy of such classifiers on arbitrarily large sets of classes.
arXiv Detail & Related papers (2020-10-28T14:37:25Z)
Classification with Rejection Based on Cost-sensitive Classification [83.50402803131412]
We propose a novel method of classification with rejection by ensemble of learning. Experimental results demonstrate the usefulness of our proposed approach in clean, noisy, and positive-unlabeled classification.
arXiv Detail & Related papers (2020-10-22T14:05:05Z)
Understanding Classifier Mistakes with Generative Models [88.20470690631372]
Deep neural networks are effective on supervised learning tasks, but have been shown to be brittle. In this paper, we leverage generative models to identify and characterize instances where classifiers fail to generalize. Our approach is agnostic to class labels from the training set which makes it applicable to models trained in a semi-supervised way.
arXiv Detail & Related papers (2020-10-05T22:13:21Z)
Certified Robustness to Label-Flipping Attacks via Randomized Smoothing [105.91827623768724]
Machine learning algorithms are susceptible to data poisoning attacks. We present a unifying view of randomized smoothing over arbitrary functions. We propose a new strategy for building classifiers that are pointwise-certifiably robust to general data poisoning attacks.
arXiv Detail & Related papers (2020-02-07T21:28:30Z)

This list is automatically generated from the titles and abstracts of the papers in this site.