Principal Component Analysis based frameworks for efficient missing data
imputation algorithms
- URL: http://arxiv.org/abs/2205.15150v3
- Date: Sun, 19 Mar 2023 18:20:39 GMT
- Title: Principal Component Analysis based frameworks for efficient missing data
imputation algorithms
- Authors: Thu Nguyen, Hoang Thien Ly, Michael Alexander Riegler, P{\aa}l
Halvorsen, Hugo L. Hammer
- Abstract summary: We propose Principal Component Analysis Imputation (PCAI) to speed up the imputation process and alleviate memory issues of many available imputation techniques.
Next, we introduce PCA Imputation - Classification (PIC), an application of PCAI for classification problems with some adjustments.
We validate our approach by experiments on various scenarios, which shows that PCAI and PIC can work with various imputation algorithms.
- Score: 3.635056427544418
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: Missing data is a commonly occurring problem in practice. Many imputation
methods have been developed to fill in the missing entries. However, not all of
them can scale to high-dimensional data, especially the multiple imputation
techniques. Meanwhile, the data nowadays tends toward high-dimensional.
Therefore, in this work, we propose Principal Component Analysis Imputation
(PCAI), a simple but versatile framework based on Principal Component Analysis
(PCA) to speed up the imputation process and alleviate memory issues of many
available imputation techniques, without sacrificing the imputation quality in
term of MSE. In addition, the frameworks can be used even when some or all of
the missing features are categorical, or when the number of missing features is
large. Next, we introduce PCA Imputation - Classification (PIC), an application
of PCAI for classification problems with some adjustments. We validate our
approach by experiments on various scenarios, which shows that PCAI and PIC can
work with various imputation algorithms, including the state-of-the-art ones
and improve the imputation speed significantly, while achieving competitive
mean square error/classification accuracy compared to direct imputation (i.e.,
impute directly on the missing data).
Related papers
- Simple Ingredients for Offline Reinforcement Learning [86.1988266277766]
offline reinforcement learning algorithms have proven effective on datasets highly connected to the target downstream task.
We show that existing methods struggle with diverse data: their performance considerably deteriorates as data collected for related but different tasks is simply added to the offline buffer.
We show that scale, more than algorithmic considerations, is the key factor influencing performance.
arXiv Detail & Related papers (2024-03-19T18:57:53Z) - In-Database Data Imputation [0.6157028677798809]
Missing data is a widespread problem in many domains, creating challenges in data analysis and decision making.
Traditional techniques for dealing with missing data, such as excluding incomplete records or imputing simple estimates, are computationally efficient but may introduce bias and disrupt variable relationships.
Model-based imputation techniques offer a more robust solution that preserves the variability and relationships in the data, but they demand significantly more computation time.
This work enables efficient, high-quality, and scalable data imputation within a database system using the widely used MICE method.
arXiv Detail & Related papers (2024-01-07T01:57:41Z) - Learning-Augmented K-Means Clustering Using Dimensional Reduction [1.7243216387069678]
We propose a solution to reduce the dimensionality of the dataset using Principal Component Analysis (PCA)
PCA is well-established in the literature and has become one of the most useful tools for data modeling, compression, and visualization.
arXiv Detail & Related papers (2024-01-06T12:02:33Z) - An online algorithm for contrastive Principal Component Analysis [9.090031210111919]
We derive an online algorithm for cPCA* and show that it maps onto a neural network with local learning rules, so it can potentially be implemented in energy efficient neuromorphic hardware.
We evaluate the performance of our online algorithm on real datasets and highlight the differences and similarities with the original formulation.
arXiv Detail & Related papers (2022-11-14T19:48:48Z) - Imputation of missing values in multi-view data [0.24739484546803336]
We introduce a new imputation method based on the existing stacked penalized logistic regression algorithm for multi-view learning.
We compare the performance of the new imputation method with several existing imputation algorithms in simulated data sets and a real data application.
arXiv Detail & Related papers (2022-10-26T05:19:30Z) - Domain Adaptation Principal Component Analysis: base linear method for
learning with out-of-distribution data [55.41644538483948]
Domain adaptation is a popular paradigm in modern machine learning.
We present a method called Domain Adaptation Principal Component Analysis (DAPCA)
DAPCA finds a linear reduced data representation useful for solving the domain adaptation task.
arXiv Detail & Related papers (2022-08-28T21:10:56Z) - Learning to Detect Critical Nodes in Sparse Graphs via Feature Importance Awareness [53.351863569314794]
The critical node problem (CNP) aims to find a set of critical nodes from a network whose deletion maximally degrades the pairwise connectivity of the residual network.
This work proposes a feature importance-aware graph attention network for node representation.
It combines it with dueling double deep Q-network to create an end-to-end algorithm to solve CNP for the first time.
arXiv Detail & Related papers (2021-12-03T14:23:05Z) - MIRACLE: Causally-Aware Imputation via Learning Missing Data Mechanisms [82.90843777097606]
We propose a causally-aware imputation algorithm (MIRACLE) for missing data.
MIRACLE iteratively refines the imputation of a baseline by simultaneously modeling the missingness generating mechanism.
We conduct extensive experiments on synthetic and a variety of publicly available datasets to show that MIRACLE is able to consistently improve imputation.
arXiv Detail & Related papers (2021-11-04T22:38:18Z) - TELESTO: A Graph Neural Network Model for Anomaly Classification in
Cloud Services [77.454688257702]
Machine learning (ML) and artificial intelligence (AI) are applied on IT system operation and maintenance.
One direction aims at the recognition of re-occurring anomaly types to enable remediation automation.
We propose a method that is invariant to dimensionality changes of given data.
arXiv Detail & Related papers (2021-02-25T14:24:49Z) - Computational Barriers to Estimation from Low-Degree Polynomials [81.67886161671379]
We study the power of low-degrees for the task of detecting the presence of hidden structures.
For a large class of "signal plus noise" problems, we give a user-friendly lower bound for the best possible mean squared error achievable by any degree.
As applications, we give a tight characterization of the low-degree minimum mean squared error for the planted submatrix and planted dense subgraph problems.
arXiv Detail & Related papers (2020-08-05T17:52:10Z) - Establishing strong imputation performance of a denoising autoencoder in
a wide range of missing data problems [0.0]
We develop a consistent framework for both training and imputation.
We benchmarked the results against state-of-the-art imputation methods.
The developed autoencoder obtained the smallest error for all ranges of initial data corruption.
arXiv Detail & Related papers (2020-04-06T12:00:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.