On The Effectiveness of One-Class Support Vector Machine in Different Defect Prediction Scenarios
- URL: http://arxiv.org/abs/2202.12074v2
- Date: Sat, 23 Mar 2024 10:24:47 GMT
- Title: On The Effectiveness of One-Class Support Vector Machine in Different Defect Prediction Scenarios
- Authors: Rebecca Moussa, Danielle Azar, Federica Sarro,
- Abstract summary: Defect prediction aims at identifying software components that are likely to cause faults before a software is made available to the end-user.
Previous studies show that One-Class Support Vector Machine (OCSVM) can outperform two-class classifiers for within-project defect prediction.
We investigate whether learning from one class only is sufficient to produce effective defect prediction model in two other different scenarios.
- Score: 7.592094566354553
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Defect prediction aims at identifying software components that are likely to cause faults before a software is made available to the end-user. To date, this task has been modeled as a two-class classification problem, however its nature also allows it to be formulated as a one-class classification task. Previous studies show that One-Class Support Vector Machine (OCSVM) can outperform two-class classifiers for within-project defect prediction, however it is not effective when employed at a finer granularity (i.e., commit-level defect prediction). In this paper, we further investigate whether learning from one class only is sufficient to produce effective defect prediction model in two other different scenarios (i.e., granularity), namely cross-version and cross-project defect prediction models, as well as replicate the previous work at within-project granularity for completeness. Our empirical results confirm that OCSVM performance remain low at different granularity levels, that is, it is outperformed by the two-class Random Forest (RF) classifier for both cross-version and cross-project defect prediction. While, we cannot conclude that OCSVM is the best classifier, our results still show interesting findings. While OCSVM does not outperform RF, it still achieves performance superior to its two-class counterpart (i.e., SVM) as well as other two-class classifiers studied herein. We also observe that OCSVM is more suitable for both cross-version and cross-project defect prediction, rather than for within-project defect prediction, thus suggesting it performs better with heterogeneous data. We encourage further research on one-class classifiers for defect prediction as these techniques may serve as an alternative when data about defective modules is scarce or not available.
Related papers
- Toward Multi-class Anomaly Detection: Exploring Class-aware Unified Model against Inter-class Interference [67.36605226797887]
We introduce a Multi-class Implicit Neural representation Transformer for unified Anomaly Detection (MINT-AD)
By learning the multi-class distributions, the model generates class-aware query embeddings for the transformer decoder.
MINT-AD can project category and position information into a feature embedding space, further supervised by classification and prior probability loss functions.
arXiv Detail & Related papers (2024-03-21T08:08:31Z) - Characterizing the Optimal 0-1 Loss for Multi-class Classification with
a Test-time Attacker [57.49330031751386]
We find achievable information-theoretic lower bounds on loss in the presence of a test-time attacker for multi-class classifiers on any discrete dataset.
We provide a general framework for finding the optimal 0-1 loss that revolves around the construction of a conflict hypergraph from the data and adversarial constraints.
arXiv Detail & Related papers (2023-02-21T15:17:13Z) - RF+clust for Leave-One-Problem-Out Performance Prediction [0.9281671380673306]
We study leave-one-problem-out (LOPO) performance prediction.
We analyze whether standard random forest (RF) model predictions can be improved by calibrating them with a weighted average of performance values.
arXiv Detail & Related papers (2023-01-23T16:14:59Z) - Parametric Classification for Generalized Category Discovery: A Baseline
Study [70.73212959385387]
Generalized Category Discovery (GCD) aims to discover novel categories in unlabelled datasets using knowledge learned from labelled samples.
We investigate the failure of parametric classifiers, verify the effectiveness of previous design choices when high-quality supervision is available, and identify unreliable pseudo-labels as a key problem.
We propose a simple yet effective parametric classification method that benefits from entropy regularisation, achieves state-of-the-art performance on multiple GCD benchmarks and shows strong robustness to unknown class numbers.
arXiv Detail & Related papers (2022-11-21T18:47:11Z) - The Impact of Using Regression Models to Build Defect Classifiers [13.840006058766766]
It is common practice to discretize continuous defect counts into defective and non-defective classes.
We compare the performance and interpretation of defect classifiers built using both approaches.
arXiv Detail & Related papers (2022-02-12T22:12:55Z) - Is the Performance of My Deep Network Too Good to Be True? A Direct
Approach to Estimating the Bayes Error in Binary Classification [86.32752788233913]
In classification problems, the Bayes error can be used as a criterion to evaluate classifiers with state-of-the-art performance.
We propose a simple and direct Bayes error estimator, where we just take the mean of the labels that show emphuncertainty of the classes.
Our flexible approach enables us to perform Bayes error estimation even for weakly supervised data.
arXiv Detail & Related papers (2022-02-01T13:22:26Z) - Score-Based Generative Classifiers [9.063815952852783]
Generative models have been used as adversarially robust classifiers on simple datasets such as MNIST.
Previous results have suggested a trade-off between the likelihood of the data and classification accuracy.
We show that score-based generative models are closing the gap in classification accuracy compared to standard discriminative models.
arXiv Detail & Related papers (2021-10-01T15:05:33Z) - No Fear of Heterogeneity: Classifier Calibration for Federated Learning
with Non-IID Data [78.69828864672978]
A central challenge in training classification models in the real-world federated system is learning with non-IID data.
We propose a novel and simple algorithm called Virtual Representations (CCVR), which adjusts the classifier using virtual representations sampled from an approximated ssian mixture model.
Experimental results demonstrate that CCVR state-of-the-art performance on popular federated learning benchmarks including CIFAR-10, CIFAR-100, and CINIC-10.
arXiv Detail & Related papers (2021-06-09T12:02:29Z) - Model Rectification via Unknown Unknowns Extraction from Deployment
Samples [8.0497115494227]
We propose a general algorithmic framework that aims to perform a post-training model rectification at deployment time in a supervised way.
RTSCV extracts unknown unknowns (u.u.s)
We show that RTSCV consistently outperforms state-of-the-art approaches.
arXiv Detail & Related papers (2021-02-08T11:46:19Z) - Solving Long-tailed Recognition with Deep Realistic Taxonomic Classifier [68.38233199030908]
Long-tail recognition tackles the natural non-uniformly distributed data in realworld scenarios.
While moderns perform well on populated classes, its performance degrades significantly on tail classes.
Deep-RTC is proposed as a new solution to the long-tail problem, combining realism with hierarchical predictions.
arXiv Detail & Related papers (2020-07-20T05:57:42Z) - An Unsupervised Learning Classifier with Competitive Error Performance [0.0]
The model is based on the incremental execution of small step shift and rotation operations upon selected discriminative hyperplanes.
When applied, in conjunction with a selected feature extractor, to a subset of the ImageNet dataset benchmark, it yields 6.2 % Top 3 probability of error.
This result may also be contrasted with popular unsupervised learning schemes such as k-Means which is shown to be practically useless on same dataset.
arXiv Detail & Related papers (2018-06-25T11:12:03Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.