Related papers: On The Effectiveness of One-Class Support Vector Machine in Different Defect Prediction Scenarios

On The Effectiveness of One-Class Support Vector Machine in Different Defect Prediction Scenarios

URL: http://arxiv.org/abs/2202.12074v2
Date: Sat, 23 Mar 2024 10:24:47 GMT
Title: On The Effectiveness of One-Class Support Vector Machine in Different Defect Prediction Scenarios
Authors: Rebecca Moussa, Danielle Azar, Federica Sarro,
Abstract summary: Defect prediction aims at identifying software components that are likely to cause faults before a software is made available to the end-user. Previous studies show that One-Class Support Vector Machine (OCSVM) can outperform two-class classifiers for within-project defect prediction. We investigate whether learning from one class only is sufficient to produce effective defect prediction model in two other different scenarios.
Score: 7.592094566354553
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Defect prediction aims at identifying software components that are likely to cause faults before a software is made available to the end-user. To date, this task has been modeled as a two-class classification problem, however its nature also allows it to be formulated as a one-class classification task. Previous studies show that One-Class Support Vector Machine (OCSVM) can outperform two-class classifiers for within-project defect prediction, however it is not effective when employed at a finer granularity (i.e., commit-level defect prediction). In this paper, we further investigate whether learning from one class only is sufficient to produce effective defect prediction model in two other different scenarios (i.e., granularity), namely cross-version and cross-project defect prediction models, as well as replicate the previous work at within-project granularity for completeness. Our empirical results confirm that OCSVM performance remain low at different granularity levels, that is, it is outperformed by the two-class Random Forest (RF) classifier for both cross-version and cross-project defect prediction. While, we cannot conclude that OCSVM is the best classifier, our results still show interesting findings. While OCSVM does not outperform RF, it still achieves performance superior to its two-class counterpart (i.e., SVM) as well as other two-class classifiers studied herein. We also observe that OCSVM is more suitable for both cross-version and cross-project defect prediction, rather than for within-project defect prediction, thus suggesting it performs better with heterogeneous data. We encourage further research on one-class classifiers for defect prediction as these techniques may serve as an alternative when data about defective modules is scarce or not available.

Related papers

Bug Destiny Prediction in Large Open-Source Software Repositories through Sentiment Analysis and BERT Topic Modeling [3.481985817302898]
We leverage features available before a bug is resolved to enhance predictive accuracy. Our methodology incorporates sentiment analysis to derive both an emotionality score and a sentiment classification. Results demonstrate that sentiment analysis serves as a valuable predictor of a bug's eventual outcome.
arXiv Detail & Related papers (2025-04-22T15:18:14Z)
Toward Multi-class Anomaly Detection: Exploring Class-aware Unified Model against Inter-class Interference [67.36605226797887]
We introduce a Multi-class Implicit Neural representation Transformer for unified Anomaly Detection (MINT-AD) By learning the multi-class distributions, the model generates class-aware query embeddings for the transformer decoder. MINT-AD can project category and position information into a feature embedding space, further supervised by classification and prior probability loss functions.
arXiv Detail & Related papers (2024-03-21T08:08:31Z)
Characterizing the Optimal 0-1 Loss for Multi-class Classification with a Test-time Attacker [57.49330031751386]
We find achievable information-theoretic lower bounds on loss in the presence of a test-time attacker for multi-class classifiers on any discrete dataset. We provide a general framework for finding the optimal 0-1 loss that revolves around the construction of a conflict hypergraph from the data and adversarial constraints.
arXiv Detail & Related papers (2023-02-21T15:17:13Z)
RF+clust for Leave-One-Problem-Out Performance Prediction [0.9281671380673306]
We study leave-one-problem-out (LOPO) performance prediction. We analyze whether standard random forest (RF) model predictions can be improved by calibrating them with a weighted average of performance values.
arXiv Detail & Related papers (2023-01-23T16:14:59Z)
Parametric Classification for Generalized Category Discovery: A Baseline Study [70.73212959385387]
Generalized Category Discovery (GCD) aims to discover novel categories in unlabelled datasets using knowledge learned from labelled samples. We investigate the failure of parametric classifiers, verify the effectiveness of previous design choices when high-quality supervision is available, and identify unreliable pseudo-labels as a key problem. We propose a simple yet effective parametric classification method that benefits from entropy regularisation, achieves state-of-the-art performance on multiple GCD benchmarks and shows strong robustness to unknown class numbers.
arXiv Detail & Related papers (2022-11-21T18:47:11Z)
The Impact of Using Regression Models to Build Defect Classifiers [13.840006058766766]
It is common practice to discretize continuous defect counts into defective and non-defective classes. We compare the performance and interpretation of defect classifiers built using both approaches.
arXiv Detail & Related papers (2022-02-12T22:12:55Z)
Is the Performance of My Deep Network Too Good to Be True? A Direct Approach to Estimating the Bayes Error in Binary Classification [86.32752788233913]
In classification problems, the Bayes error can be used as a criterion to evaluate classifiers with state-of-the-art performance. We propose a simple and direct Bayes error estimator, where we just take the mean of the labels that show emphuncertainty of the classes. Our flexible approach enables us to perform Bayes error estimation even for weakly supervised data.
arXiv Detail & Related papers (2022-02-01T13:22:26Z)
Score-Based Generative Classifiers [9.063815952852783]
Generative models have been used as adversarially robust classifiers on simple datasets such as MNIST. Previous results have suggested a trade-off between the likelihood of the data and classification accuracy. We show that score-based generative models are closing the gap in classification accuracy compared to standard discriminative models.
arXiv Detail & Related papers (2021-10-01T15:05:33Z)
No Fear of Heterogeneity: Classifier Calibration for Federated Learning with Non-IID Data [78.69828864672978]
A central challenge in training classification models in the real-world federated system is learning with non-IID data. We propose a novel and simple algorithm called Virtual Representations (CCVR), which adjusts the classifier using virtual representations sampled from an approximated ssian mixture model. Experimental results demonstrate that CCVR state-of-the-art performance on popular federated learning benchmarks including CIFAR-10, CIFAR-100, and CINIC-10.
arXiv Detail & Related papers (2021-06-09T12:02:29Z)
Model Rectification via Unknown Unknowns Extraction from Deployment Samples [8.0497115494227]
We propose a general algorithmic framework that aims to perform a post-training model rectification at deployment time in a supervised way. RTSCV extracts unknown unknowns (u.u.s) We show that RTSCV consistently outperforms state-of-the-art approaches.
arXiv Detail & Related papers (2021-02-08T11:46:19Z)
Solving Long-tailed Recognition with Deep Realistic Taxonomic Classifier [68.38233199030908]
Long-tail recognition tackles the natural non-uniformly distributed data in realworld scenarios. While moderns perform well on populated classes, its performance degrades significantly on tail classes. Deep-RTC is proposed as a new solution to the long-tail problem, combining realism with hierarchical predictions.
arXiv Detail & Related papers (2020-07-20T05:57:42Z)
An Unsupervised Learning Classifier with Competitive Error Performance [0.0]
The model is based on the incremental execution of small step shift and rotation operations upon selected discriminative hyperplanes. When applied, in conjunction with a selected feature extractor, to a subset of the ImageNet dataset benchmark, it yields 6.2 % Top 3 probability of error. This result may also be contrasted with popular unsupervised learning schemes such as k-Means which is shown to be practically useless on same dataset.
arXiv Detail & Related papers (2018-06-25T11:12:03Z)

This list is automatically generated from the titles and abstracts of the papers in this site.