A Novel Multiple Ensemble Learning Models Based on Different Datasets
for Software Defect Prediction
- URL: http://arxiv.org/abs/2008.13114v1
- Date: Sun, 30 Aug 2020 08:01:39 GMT
- Title: A Novel Multiple Ensemble Learning Models Based on Different Datasets
for Software Defect Prediction
- Authors: Ali Nawaz, Attique Ur Rehman, Muhammad Abbas
- Abstract summary: This paper proposes an ensemble learning models and perform comparative analysis among KNN, Decision tree, SVM and Na"ive Bayes on different datasets.
The classification accuracy of ensemble model trained on CM1 is 98.56%, classification accuracy of ensemble model trained on KM2 is 98.18% similarly, the classification accuracy of ensemble learning model trained on PC1 is 99.27%.
- Score: 3.6095388702618414
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Software testing is one of the important ways to ensure the quality of
software. It is found that testing cost more than 50% of overall project cost.
Effective and efficient software testing utilizes the minimum resources of
software. Therefore, it is important to construct the procedure which is not
only able to perform the efficient testing but also minimizes the utilization
of project resources. The goal of software testing is to find maximum defects
in the software system. More the defects found in the software ensure more
efficiency is the software testing Different techniques have been proposed to
detect the defects in software and to utilize the resources and achieve good
results. As world is continuously moving toward data driven approach for making
important decision. Therefore, in this research paper we performed the machine
learning analysis on the publicly available datasets and tried to achieve the
maximum accuracy. The major focus of the paper is to apply different machine
learning techniques on the datasets and find out which technique produce
efficient result. Particularly, we proposed an ensemble learning models and
perform comparative analysis among KNN, Decision tree, SVM and Na\"ive Bayes on
different datasets and it is demonstrated that performance of Ensemble method
is more than other methods in term of accuracy, precision, recall and F1-score.
The classification accuracy of ensemble model trained on CM1 is 98.56%,
classification accuracy of ensemble model trained on KM2 is 98.18% similarly,
the classification accuracy of ensemble learning model trained on PC1 is
99.27%. This reveals that Ensemble is more efficient method for making the
defect prediction as compared other techniques.
Related papers
- Unlearning as multi-task optimization: A normalized gradient difference approach with an adaptive learning rate [105.86576388991713]
We introduce a normalized gradient difference (NGDiff) algorithm, enabling us to have better control over the trade-off between the objectives.
We provide a theoretical analysis and empirically demonstrate the superior performance of NGDiff among state-of-the-art unlearning methods on the TOFU and MUSE datasets.
arXiv Detail & Related papers (2024-10-29T14:41:44Z) - An Empirical Study of the Impact of Test Strategies on Online Optimization for Ensemble-Learning Defect Prediction [2.547631669143471]
We employ bandit algorithms (BA), an online optimization method, to select the highest-accuracy ensemble method.
We used six popular defect prediction datasets, four ensemble learning methods such as bagging, and three test strategies such as testing positive-prediction modules first (PF)
Our results show that when BA is applied with PF, the prediction accuracy improved on average, and the number of found defects increased by 7% on a minimum of five out of six datasets.
arXiv Detail & Related papers (2024-09-10T07:06:50Z) - Using Machine Learning To Identify Software Weaknesses From Software
Requirement Specifications [49.1574468325115]
This research focuses on finding an efficient machine learning algorithm to identify software weaknesses from requirement specifications.
Keywords extracted using latent semantic analysis help map the CWE categories to PROMISE_exp. Naive Bayes, support vector machine (SVM), decision trees, neural network, and convolutional neural network (CNN) algorithms were tested.
arXiv Detail & Related papers (2023-08-10T13:19:10Z) - Efficient human-in-loop deep learning model training with iterative
refinement and statistical result validation [0.0]
We demonstrate a method for creating segmentations, a necessary part of a data cleaning for ultrasound imaging machine learning pipelines.
We propose a four-step method to leverage automatically generated training data and fast human visual checks to improve model accuracy while keeping the time/effort and cost low.
The method is demonstrated on a cardiac ultrasound segmentation task, removing background data, including static PHI.
arXiv Detail & Related papers (2023-04-03T13:56:01Z) - Leveraging Unlabeled Data to Predict Out-of-Distribution Performance [63.740181251997306]
Real-world machine learning deployments are characterized by mismatches between the source (training) and target (test) distributions.
In this work, we investigate methods for predicting the target domain accuracy using only labeled source data and unlabeled target data.
We propose Average Thresholded Confidence (ATC), a practical method that learns a threshold on the model's confidence, predicting accuracy as the fraction of unlabeled examples.
arXiv Detail & Related papers (2022-01-11T23:01:12Z) - Detecting Errors and Estimating Accuracy on Unlabeled Data with
Self-training Ensembles [38.23896575179384]
We propose a principled and practically effective framework that simultaneously addresses the two tasks.
One instantiation reduces the estimation error for unsupervised accuracy estimation by at least 70% and improves the F1 score for error detection by at least 4.7%.
On iWildCam, one instantiation reduces the estimation error for unsupervised accuracy estimation by at least 70% and improves the F1 score for error detection by at least 4.7%.
arXiv Detail & Related papers (2021-06-29T21:32:51Z) - Can Active Learning Preemptively Mitigate Fairness Issues? [66.84854430781097]
dataset bias is one of the prevailing causes of unfairness in machine learning.
We study whether models trained with uncertainty-based ALs are fairer in their decisions with respect to a protected class.
We also explore the interaction of algorithmic fairness methods such as gradient reversal (GRAD) and BALD.
arXiv Detail & Related papers (2021-04-14T14:20:22Z) - ALT-MAS: A Data-Efficient Framework for Active Testing of Machine
Learning Algorithms [58.684954492439424]
We propose a novel framework to efficiently test a machine learning model using only a small amount of labeled test data.
The idea is to estimate the metrics of interest for a model-under-test using Bayesian neural network (BNN)
arXiv Detail & Related papers (2021-04-11T12:14:04Z) - SIMPLE: SIngle-network with Mimicking and Point Learning for Bottom-up
Human Pose Estimation [81.03485688525133]
We propose a novel multi-person pose estimation framework, SIngle-network with Mimicking and Point Learning for Bottom-up Human Pose Estimation (SIMPLE)
Specifically, in the training process, we enable SIMPLE to mimic the pose knowledge from the high-performance top-down pipeline.
Besides, SIMPLE formulates human detection and pose estimation as a unified point learning framework to complement each other in single-network.
arXiv Detail & Related papers (2021-04-06T13:12:51Z) - The Integrity of Machine Learning Algorithms against Software Defect
Prediction [0.0]
This report analyses the performance of the Online Sequential Extreme Learning Machine (OS-ELM) proposed by Liang et.al.
OS-ELM trains faster than conventional deep neural networks and it always converges to the globally optimal solution.
The analysis is carried out on 3 projects KC1, PC4 and PC3 carried out by the NASA group.
arXiv Detail & Related papers (2020-09-05T17:26:56Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.