Related papers: Gaussian Processes for Missing Value Imputation

Gaussian Processes for Missing Value Imputation

URL: http://arxiv.org/abs/2204.04648v1
Date: Sun, 10 Apr 2022 10:46:26 GMT
Title: Gaussian Processes for Missing Value Imputation
Authors: Bahram Jafrasteh, Daniel Hern\'andez-Lobato, Sim\'on Pedro Lubi\'an-L\'opez, Isabel Benavente-Fern\'andez
Abstract summary: We present a hierarchical composition of sparse GPs that is used to predict missing values at each dimension using all the variables from the other dimensions. The approach missing GP (MGP) can be trained simultaneously to impute all observed missing values. We evaluate MGP in one private clinical data set and four UCI datasets with a different percentage of missing values.
Score: 0.0
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Missing values are common in many real-life datasets. However, most of the current machine learning methods can not handle missing values. This means that they should be imputed beforehand. Gaussian Processes (GPs) are non-parametric models with accurate uncertainty estimates that combined with sparse approximations and stochastic variational inference scale to large data sets. Sparse GPs can be used to compute a predictive distribution for missing data. Here, we present a hierarchical composition of sparse GPs that is used to predict missing values at each dimension using all the variables from the other dimensions. We call the approach missing GP (MGP). MGP can be trained simultaneously to impute all observed missing values. Specifically, it outputs a predictive distribution for each missing value that is then used in the imputation of other missing values. We evaluate MGP in one private clinical data set and four UCI datasets with a different percentage of missing values. We compare the performance of MGP with other state-of-the-art methods for imputing missing values, including variants based on sparse GPs and deep GPs. The results obtained show a significantly better performance of MGP.

Related papers

DUPRE: Data Utility Prediction for Efficient Data Valuation [49.60564885180563]
Cooperative game theory-based data valuation, such as Data Shapley, requires evaluating the data utility and retraining the ML model for multiple data subsets. Our framework, textttDUPRE, takes an alternative yet complementary approach that reduces the cost per subset evaluation by predicting data utilities instead of evaluating them by model retraining. Specifically, given the evaluated data utilities of some data subsets, textttDUPRE fits a emphGaussian process (GP) regression model to predict the utility of every other data subset.
arXiv Detail & Related papers (2025-02-22T08:53:39Z)
Semiparametric conformal prediction [79.6147286161434]
Risk-sensitive applications require well-calibrated prediction sets over multiple, potentially correlated target variables. We treat the scores as random vectors and aim to construct the prediction set accounting for their joint correlation structure. We report desired coverage and competitive efficiency on a range of real-world regression problems.
arXiv Detail & Related papers (2024-11-04T14:29:02Z)
Leveraging Locality and Robustness to Achieve Massively Scalable Gaussian Process Regression [1.3518297878940662]
We introduce a new perspective by exploring robustness properties and limiting behaviour of GP nearest-neighbour (GPnn) prediction. As the data-size n increases, accuracy of estimated parameters and GP model assumptions become increasingly irrelevant to GPnn predictive accuracy. We show that this source of inaccuracy can be corrected for, thereby achieving both well-calibrated uncertainty measures and accurate predictions at remarkably low computational cost.
arXiv Detail & Related papers (2023-06-26T14:32:46Z)
Scaling Gaussian Process Optimization by Evaluating a Few Unique Candidates Multiple Times [119.41129787351092]
We show that sequential black-box optimization based on GPs can be made efficient by sticking to a candidate solution for multiple evaluation steps. We modify two well-established GP-Opt algorithms, GP-UCB and GP-EI to adapt rules from batched GP-Opt.
arXiv Detail & Related papers (2022-01-30T20:42:14Z)
Non-Gaussian Gaussian Processes for Few-Shot Regression [71.33730039795921]
We propose an invertible ODE-based mapping that operates on each component of the random variable vectors and shares the parameters across all of them. NGGPs outperform the competing state-of-the-art approaches on a diversified set of benchmarks and applications.
arXiv Detail & Related papers (2021-10-26T10:45:25Z)
Incremental Ensemble Gaussian Processes [53.3291389385672]
We propose an incremental ensemble (IE-) GP framework, where an EGP meta-learner employs an it ensemble of GP learners, each having a unique kernel belonging to a prescribed kernel dictionary. With each GP expert leveraging the random feature-based approximation to perform online prediction and model update with it scalability, the EGP meta-learner capitalizes on data-adaptive weights to synthesize the per-expert predictions. The novel IE-GP is generalized to accommodate time-varying functions by modeling structured dynamics at the EGP meta-learner and within each GP learner.
arXiv Detail & Related papers (2021-10-13T15:11:25Z)
Imputation-Free Learning from Incomplete Observations [73.15386629370111]
We introduce the importance of guided gradient descent (IGSGD) method to train inference from inputs containing missing values without imputation. We employ reinforcement learning (RL) to adjust the gradients used to train the models via back-propagation. Our imputation-free predictions outperform the traditional two-step imputation-based predictions using state-of-the-art imputation methods.
arXiv Detail & Related papers (2021-07-05T12:44:39Z)
MuyGPs: Scalable Gaussian Process Hyperparameter Estimation Using Local Cross-Validation [1.2233362977312945]
We present MuyGPs, a novel efficient GP hyper parameter estimation method. MuyGPs builds upon prior methods that take advantage of the nearest neighbors structure of the data. We show that our method outperforms all known competitors both in terms of time-to-solution and the root mean squared error of the predictions.
arXiv Detail & Related papers (2021-04-29T18:10:21Z)
A Hamiltonian Monte Carlo Model for Imputation and Augmentation of Healthcare Data [0.6719751155411076]
Missing values exist in nearly all clinical studies because data for a variable or question are not collected or not available. Existing models usually do not consider privacy concerns or do not utilise the inherent correlations across multiple features to impute the missing values. A Bayesian approach to impute missing values and creating augmented samples in high dimensional healthcare data is proposed in this work.
arXiv Detail & Related papers (2021-03-03T11:57:42Z)
Sparse Gaussian Process Variational Autoencoders [24.86751422740643]
Existing approaches for performing inference in GP-DGMs do not support sparse GP approximations based on points. We develop the sparse Gaussian processal variation autoencoder (GP-VAE) characterised by the use of partial inference networks for parameterising sparse GP approximations.
arXiv Detail & Related papers (2020-10-20T10:19:56Z)
MOGPTK: The Multi-Output Gaussian Process Toolkit [71.08576457371433]
We present MOGPTK, a Python package for multi-channel data modelling using Gaussian processes (GP) The aim of this toolkit is to make multi-output GP (MOGP) models accessible to researchers, data scientists, and practitioners alike.
arXiv Detail & Related papers (2020-02-09T23:34:49Z)
On the consistency of supervised learning with missing values [15.666860186278782]
In many application settings, the data have missing entries which make analysis challenging. Here, we consider supervised-learning settings: predicting a target when missing values appear in both training and testing data. We show that the widely-used method of imputing with a constant, such as the mean prior to learning, is consistent when missing values are not informative.
arXiv Detail & Related papers (2019-02-19T07:27:19Z)

This list is automatically generated from the titles and abstracts of the papers in this site.