Related papers: The Integrity of Machine Learning Algorithms against Software Defect Prediction

The Integrity of Machine Learning Algorithms against Software Defect Prediction

URL: http://arxiv.org/abs/2009.02571v1
Date: Sat, 5 Sep 2020 17:26:56 GMT
Title: The Integrity of Machine Learning Algorithms against Software Defect Prediction
Authors: Param Khakhar and, Rahul Kumar Dubey, Senior Member IEEE
Abstract summary: This report analyses the performance of the Online Sequential Extreme Learning Machine (OS-ELM) proposed by Liang et.al. OS-ELM trains faster than conventional deep neural networks and it always converges to the globally optimal solution. The analysis is carried out on 3 projects KC1, PC4 and PC3 carried out by the NASA group.
Score: 0.0
License: http://creativecommons.org/licenses/by/4.0/
Abstract: The increased computerization in recent years has resulted in the production of a variety of different software, however measures need to be taken to ensure that the produced software isn't defective. Many researchers have worked in this area and have developed different Machine Learning-based approaches that predict whether the software is defective or not. This issue can't be resolved simply by using different conventional classifiers because the dataset is highly imbalanced i.e the number of defective samples detected is extremely less as compared to the number of non-defective samples. Therefore, to address this issue, certain sophisticated methods are required. The different methods developed by the researchers can be broadly classified into Resampling based methods, Cost-sensitive learning-based methods, and Ensemble Learning. Among these methods. This report analyses the performance of the Online Sequential Extreme Learning Machine (OS-ELM) proposed by Liang et.al. against several classifiers such as Logistic Regression, Support Vector Machine, Random Forest, and Na\"ive Bayes after oversampling the data. OS-ELM trains faster than conventional deep neural networks and it always converges to the globally optimal solution. A comparison is performed on the original dataset as well as the over-sampled data set. The oversampling technique used is Cluster-based Over-Sampling with Noise Filtering. This technique is better than several state-of-the-art techniques for oversampling. The analysis is carried out on 3 projects KC1, PC4 and PC3 carried out by the NASA group. The metrics used for measurement are recall and balanced accuracy. The results are higher for OS-ELM as compared to other classifiers in both scenarios.

Related papers

Generalized Oversampling for Learning from Imbalanced datasets and Associated Theory [0.0]
In supervised learning, it is quite frequent to be confronted with real imbalanced datasets. We propose a data augmentation procedure, the GOLIATH algorithm, based on kernel density estimates. We evaluate the performance of the GOLIATH algorithm in imbalanced regression situations.
arXiv Detail & Related papers (2023-08-05T23:08:08Z)
Compound Batch Normalization for Long-tailed Image Classification [77.42829178064807]
We propose a compound batch normalization method based on a Gaussian mixture. It can model the feature space more comprehensively and reduce the dominance of head classes. The proposed method outperforms existing methods on long-tailed image classification.
arXiv Detail & Related papers (2022-12-02T07:31:39Z)
Intra-class Adaptive Augmentation with Neighbor Correction for Deep Metric Learning [99.14132861655223]
We propose a novel intra-class adaptive augmentation (IAA) framework for deep metric learning. We reasonably estimate intra-class variations for every class and generate adaptive synthetic samples to support hard samples mining. Our method significantly improves and outperforms the state-of-the-art methods on retrieval performances by 3%-6%.
arXiv Detail & Related papers (2022-11-29T14:52:38Z)
Towards Automated Imbalanced Learning with Deep Hierarchical Reinforcement Learning [57.163525407022966]
Imbalanced learning is a fundamental challenge in data mining, where there is a disproportionate ratio of training samples in each class. Over-sampling is an effective technique to tackle imbalanced learning through generating synthetic samples for the minority class. We propose AutoSMOTE, an automated over-sampling algorithm that can jointly optimize different levels of decisions.
arXiv Detail & Related papers (2022-08-26T04:28:01Z)
SreaMRAK a Streaming Multi-Resolution Adaptive Kernel Algorithm [60.61943386819384]
Existing implementations of KRR require that all the data is stored in the main memory. We propose StreaMRAK - a streaming version of KRR. We present a showcase study on two synthetic problems and the prediction of the trajectory of a double pendulum.
arXiv Detail & Related papers (2021-08-23T21:03:09Z)
Memory-Based Optimization Methods for Model-Agnostic Meta-Learning and Personalized Federated Learning [56.17603785248675]
Model-agnostic meta-learning (MAML) has become a popular research area. Existing MAML algorithms rely on the episode' idea by sampling a few tasks and data points to update the meta-model at each iteration. This paper proposes memory-based algorithms for MAML that converge with vanishing error.
arXiv Detail & Related papers (2021-06-09T08:47:58Z)
DPER: Efficient Parameter Estimation for Randomly Missing Data [0.24466725954625884]
We propose novel algorithms to find the maximum likelihood estimates (MLEs) for a one-class/multiple-class randomly missing data set. Our algorithms do not require multiple iterations through the data, thus promising to be less time-consuming than other methods.
arXiv Detail & Related papers (2021-06-06T16:37:48Z)
A Novel Resampling Technique for Imbalanced Dataset Optimization [1.0323063834827415]
classification of rare events is a common problem in many domains, such as fraudulent transactions, malware traffic analysis and network intrusion detection. We develop two versions of Generative Silhouette Resampling 1-Nearest Neighbour (G1Nos) oversampling algorithms for dealing with class imbalance problem.
arXiv Detail & Related papers (2020-12-30T17:17:08Z)
A Method for Handling Multi-class Imbalanced Data by Geometry based Information Sampling and Class Prioritized Synthetic Data Generation (GICaPS) [15.433936272310952]
This paper looks into the problem of handling imbalanced data in a multi-label classification problem. Two novel methods are proposed that exploit the geometric relationship between the feature vectors. The efficacy of the proposed methods is analyzed by solving a generic multi-class recognition problem.
arXiv Detail & Related papers (2020-10-11T04:04:26Z)
Least Squares Regression with Markovian Data: Fundamental Limits and Algorithms [69.45237691598774]
We study the problem of least squares linear regression where the data-points are dependent and are sampled from a Markov chain. We establish sharp information theoretic minimax lower bounds for this problem in terms of $tau_mathsfmix$. We propose an algorithm based on experience replay--a popular reinforcement learning technique--that achieves a significantly better error rate.
arXiv Detail & Related papers (2020-06-16T04:26:50Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.