Robust self-healing prediction model for high dimensional data
- URL: http://arxiv.org/abs/2210.01788v1
- Date: Tue, 4 Oct 2022 17:55:50 GMT
- Title: Robust self-healing prediction model for high dimensional data
- Authors: Anirudha Rayasam, Nagamma Patil
- Abstract summary: This work proposes a robust self healing (RSH) hybrid prediction model.
It functions by using the data in its entirety by removing errors and inconsistencies from it rather than discarding any data.
The proposed method is compared with some of the existing high performing models and the results are analyzed.
- Score: 0.685316573653194
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Owing to the advantages of increased accuracy and the potential to detect
unseen patterns, provided by data mining techniques they have been widely
incorporated for standard classification problems. They have often been used
for high precision disease prediction in the medical field, and several hybrid
prediction models capable of achieving high accuracies have been proposed.
Though this stands true most of the previous models fail to efficiently address
the recurring issue of bad data quality which plagues most high dimensional
data, and especially proves troublesome in the highly sensitive medical data.
This work proposes a robust self healing (RSH) hybrid prediction model which
functions by using the data in its entirety by removing errors and
inconsistencies from it rather than discarding any data. Initial processing
involves data preparation followed by cleansing or scrubbing through
context-dependent attribute correction, which ensures that there is no
significant loss of relevant information before the feature selection and
prediction phases. An ensemble of heterogeneous classifiers, subjected to local
boosting, is utilized to build the prediction model and genetic algorithm based
wrapper feature selection technique wrapped on the respective classifiers is
employed to select the corresponding optimal set of features, which warrant
higher accuracy. The proposed method is compared with some of the existing high
performing models and the results are analyzed.
Related papers
- A Federated Learning-based Industrial Health Prognostics for
Heterogeneous Edge Devices using Matched Feature Extraction [16.337207503536384]
We propose a pioneering FL-based health prognostic model with a feature similarity-matched parameter aggregation algorithm.
We show that the proposed method yields accuracy improvements as high as 44.5% and 39.3% for state-of-health estimation and remaining useful life estimation.
arXiv Detail & Related papers (2023-05-13T07:20:31Z) - Information FOMO: The unhealthy fear of missing out on information. A method for removing misleading data for healthier models [0.0]
Misleading or unnecessary data can have out-sized impacts on the health or accuracy of Machine Learning (ML) models.
We present a sequential selection method that identifies critically important information within a dataset.
We find these instabilities are a result of the complexity of the underlying map and linked to extreme events and heavy tails.
arXiv Detail & Related papers (2022-08-27T19:43:53Z) - HyperImpute: Generalized Iterative Imputation with Automatic Model
Selection [77.86861638371926]
We propose a generalized iterative imputation framework for adaptively and automatically configuring column-wise models.
We provide a concrete implementation with out-of-the-box learners, simulators, and interfaces.
arXiv Detail & Related papers (2022-06-15T19:10:35Z) - Leveraging Unlabeled Data to Predict Out-of-Distribution Performance [63.740181251997306]
Real-world machine learning deployments are characterized by mismatches between the source (training) and target (test) distributions.
In this work, we investigate methods for predicting the target domain accuracy using only labeled source data and unlabeled target data.
We propose Average Thresholded Confidence (ATC), a practical method that learns a threshold on the model's confidence, predicting accuracy as the fraction of unlabeled examples.
arXiv Detail & Related papers (2022-01-11T23:01:12Z) - Efficient remedies for outlier detection with variational autoencoders [8.80692072928023]
Likelihoods computed by deep generative models are a candidate metric for outlier detection with unlabeled data.
We show that a theoretically-grounded correction readily ameliorates a key bias with VAE likelihood estimates.
We also show that the variance of the likelihoods computed over an ensemble of VAEs also enables robust outlier detection.
arXiv Detail & Related papers (2021-08-19T16:00:58Z) - Evaluating State-of-the-Art Classification Models Against Bayes
Optimality [106.50867011164584]
We show that we can compute the exact Bayes error of generative models learned using normalizing flows.
We use our approach to conduct a thorough investigation of state-of-the-art classification models.
arXiv Detail & Related papers (2021-06-07T06:21:20Z) - Scalable Marginal Likelihood Estimation for Model Selection in Deep
Learning [78.83598532168256]
Marginal-likelihood based model-selection is rarely used in deep learning due to estimation difficulties.
Our work shows that marginal likelihoods can improve generalization and be useful when validation data is unavailable.
arXiv Detail & Related papers (2021-04-11T09:50:24Z) - A Hamiltonian Monte Carlo Model for Imputation and Augmentation of
Healthcare Data [0.6719751155411076]
Missing values exist in nearly all clinical studies because data for a variable or question are not collected or not available.
Existing models usually do not consider privacy concerns or do not utilise the inherent correlations across multiple features to impute the missing values.
A Bayesian approach to impute missing values and creating augmented samples in high dimensional healthcare data is proposed in this work.
arXiv Detail & Related papers (2021-03-03T11:57:42Z) - Curse of Small Sample Size in Forecasting of the Active Cases in
COVID-19 Outbreak [0.0]
During the COVID-19 pandemic, a massive number of attempts on the predictions of the number of cases and the other future trends of this pandemic have been made.
However, they fail to predict, in a reliable way, the medium and long term evolution of fundamental features of COVID-19 outbreak within acceptable accuracy.
This paper gives an explanation for the failure of machine learning models in this particular forecasting problem.
arXiv Detail & Related papers (2020-11-06T23:13:34Z) - Unlabelled Data Improves Bayesian Uncertainty Calibration under
Covariate Shift [100.52588638477862]
We develop an approximate Bayesian inference scheme based on posterior regularisation.
We demonstrate the utility of our method in the context of transferring prognostic models of prostate cancer across globally diverse populations.
arXiv Detail & Related papers (2020-06-26T13:50:19Z) - Evaluating Prediction-Time Batch Normalization for Robustness under
Covariate Shift [81.74795324629712]
We call prediction-time batch normalization, which significantly improves model accuracy and calibration under covariate shift.
We show that prediction-time batch normalization provides complementary benefits to existing state-of-the-art approaches for improving robustness.
The method has mixed results when used alongside pre-training, and does not seem to perform as well under more natural types of dataset shift.
arXiv Detail & Related papers (2020-06-19T05:08:43Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.