Related papers: Principal Component Analysis based frameworks for efficient missing data imputation algorithms

Principal Component Analysis based frameworks for efficient missing data imputation algorithms

URL: http://arxiv.org/abs/2205.15150v3
Date: Sun, 19 Mar 2023 18:20:39 GMT
Title: Principal Component Analysis based frameworks for efficient missing data imputation algorithms
Authors: Thu Nguyen, Hoang Thien Ly, Michael Alexander Riegler, P{\aa}l Halvorsen, Hugo L. Hammer
Abstract summary: We propose Principal Component Analysis Imputation (PCAI) to speed up the imputation process and alleviate memory issues of many available imputation techniques. Next, we introduce PCA Imputation - Classification (PIC), an application of PCAI for classification problems with some adjustments. We validate our approach by experiments on various scenarios, which shows that PCAI and PIC can work with various imputation algorithms.
Score: 3.635056427544418
License: http://creativecommons.org/licenses/by-sa/4.0/
Abstract: Missing data is a commonly occurring problem in practice. Many imputation methods have been developed to fill in the missing entries. However, not all of them can scale to high-dimensional data, especially the multiple imputation techniques. Meanwhile, the data nowadays tends toward high-dimensional. Therefore, in this work, we propose Principal Component Analysis Imputation (PCAI), a simple but versatile framework based on Principal Component Analysis (PCA) to speed up the imputation process and alleviate memory issues of many available imputation techniques, without sacrificing the imputation quality in term of MSE. In addition, the frameworks can be used even when some or all of the missing features are categorical, or when the number of missing features is large. Next, we introduce PCA Imputation - Classification (PIC), an application of PCAI for classification problems with some adjustments. We validate our approach by experiments on various scenarios, which shows that PCAI and PIC can work with various imputation algorithms, including the state-of-the-art ones and improve the imputation speed significantly, while achieving competitive mean square error/classification accuracy compared to direct imputation (i.e., impute directly on the missing data).

Related papers

Fast Iterative and Task-Specific Imputation with Online Learning [2.7855886538423187]
We propose an imputation approach named F3I based on the iterative improvement of a K-nearest neighbor imputation. We provide a theoretical analysis of the imputation quality by F3I for several types of missing mechanisms. We also demonstrate the performance of F3I on both synthetic data sets and real-life drug repurposing and handwritten-digit recognition data.
arXiv Detail & Related papers (2025-01-23T16:04:18Z)
Precision Adaptive Imputation Network : An Unified Technique for Mixed Datasets [0.0]
This study introduces the Precision Adaptive Imputation Network (PAIN), a novel algorithm designed to enhance data reconstruction. PAIN employs a tri-step process that integrates statistical methods, random forests, and autoencoders, ensuring balanced accuracy and efficiency in imputation. The findings highlight PAIN's superior ability to preserve data distributions and maintain analytical integrity, particularly in complex scenarios where missingness is not completely at random.
arXiv Detail & Related papers (2025-01-18T06:22:27Z)
Simple Ingredients for Offline Reinforcement Learning [86.1988266277766]
offline reinforcement learning algorithms have proven effective on datasets highly connected to the target downstream task. We show that existing methods struggle with diverse data: their performance considerably deteriorates as data collected for related but different tasks is simply added to the offline buffer. We show that scale, more than algorithmic considerations, is the key factor influencing performance.
arXiv Detail & Related papers (2024-03-19T18:57:53Z)
In-Database Data Imputation [0.6157028677798809]
Missing data is a widespread problem in many domains, creating challenges in data analysis and decision making. Traditional techniques for dealing with missing data, such as excluding incomplete records or imputing simple estimates, are computationally efficient but may introduce bias and disrupt variable relationships. Model-based imputation techniques offer a more robust solution that preserves the variability and relationships in the data, but they demand significantly more computation time. This work enables efficient, high-quality, and scalable data imputation within a database system using the widely used MICE method.
arXiv Detail & Related papers (2024-01-07T01:57:41Z)
Learning-Augmented K-Means Clustering Using Dimensional Reduction [1.7243216387069678]
We propose a solution to reduce the dimensionality of the dataset using Principal Component Analysis (PCA) PCA is well-established in the literature and has become one of the most useful tools for data modeling, compression, and visualization.
arXiv Detail & Related papers (2024-01-06T12:02:33Z)
An online algorithm for contrastive Principal Component Analysis [9.090031210111919]
We derive an online algorithm for cPCA* and show that it maps onto a neural network with local learning rules, so it can potentially be implemented in energy efficient neuromorphic hardware. We evaluate the performance of our online algorithm on real datasets and highlight the differences and similarities with the original formulation.
arXiv Detail & Related papers (2022-11-14T19:48:48Z)
Imputation of missing values in multi-view data [0.24739484546803336]
We introduce a new imputation method based on the existing stacked penalized logistic regression algorithm for multi-view learning. We compare the performance of the new imputation method with several existing imputation algorithms in simulated data sets and a real data application.
arXiv Detail & Related papers (2022-10-26T05:19:30Z)
Domain Adaptation Principal Component Analysis: base linear method for learning with out-of-distribution data [55.41644538483948]
Domain adaptation is a popular paradigm in modern machine learning. We present a method called Domain Adaptation Principal Component Analysis (DAPCA) DAPCA finds a linear reduced data representation useful for solving the domain adaptation task.
arXiv Detail & Related papers (2022-08-28T21:10:56Z)
Learning to Detect Critical Nodes in Sparse Graphs via Feature Importance Awareness [53.351863569314794]
The critical node problem (CNP) aims to find a set of critical nodes from a network whose deletion maximally degrades the pairwise connectivity of the residual network. This work proposes a feature importance-aware graph attention network for node representation. It combines it with dueling double deep Q-network to create an end-to-end algorithm to solve CNP for the first time.
arXiv Detail & Related papers (2021-12-03T14:23:05Z)
MIRACLE: Causally-Aware Imputation via Learning Missing Data Mechanisms [82.90843777097606]
We propose a causally-aware imputation algorithm (MIRACLE) for missing data. MIRACLE iteratively refines the imputation of a baseline by simultaneously modeling the missingness generating mechanism. We conduct extensive experiments on synthetic and a variety of publicly available datasets to show that MIRACLE is able to consistently improve imputation.
arXiv Detail & Related papers (2021-11-04T22:38:18Z)
TELESTO: A Graph Neural Network Model for Anomaly Classification in Cloud Services [77.454688257702]
Machine learning (ML) and artificial intelligence (AI) are applied on IT system operation and maintenance. One direction aims at the recognition of re-occurring anomaly types to enable remediation automation. We propose a method that is invariant to dimensionality changes of given data.
arXiv Detail & Related papers (2021-02-25T14:24:49Z)
Computational Barriers to Estimation from Low-Degree Polynomials [81.67886161671379]
We study the power of low-degrees for the task of detecting the presence of hidden structures. For a large class of "signal plus noise" problems, we give a user-friendly lower bound for the best possible mean squared error achievable by any degree. As applications, we give a tight characterization of the low-degree minimum mean squared error for the planted submatrix and planted dense subgraph problems.
arXiv Detail & Related papers (2020-08-05T17:52:10Z)
Establishing strong imputation performance of a denoising autoencoder in a wide range of missing data problems [0.0]
We develop a consistent framework for both training and imputation. We benchmarked the results against state-of-the-art imputation methods. The developed autoencoder obtained the smallest error for all ranges of initial data corruption.
arXiv Detail & Related papers (2020-04-06T12:00:30Z)

This list is automatically generated from the titles and abstracts of the papers in this site.