Asymptotically Exact and Fast Gaussian Copula Models for Imputation of
Mixed Data Types
- URL: http://arxiv.org/abs/2102.02642v1
- Date: Thu, 4 Feb 2021 14:42:29 GMT
- Title: Asymptotically Exact and Fast Gaussian Copula Models for Imputation of
Mixed Data Types
- Authors: Benjamin Christoffersen, Mark Clements, Keith Humphreys, Hedvig
Kjellstr\"om
- Abstract summary: Missing values with mixed data types is a common problem in a large number of machine learning applications.
Gustafson copula models have been suggested as a means of performing imputation of missing values using a probabilistic framework.
We address the first limitation using direct and arbitrarily precise approximations both for model estimation and imputation by using randomized quasi-Monte Carlo procedures.
- Score: 0.13764085113103217
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Missing values with mixed data types is a common problem in a large number of
machine learning applications such as processing of surveys and in different
medical applications. Recently, Gaussian copula models have been suggested as a
means of performing imputation of missing values using a probabilistic
framework. While the present Gaussian copula models have shown to yield state
of the art performance, they have two limitations: they are based on an
approximation that is fast but may be imprecise and they do not support
unordered multinomial variables. We address the first limitation using direct
and arbitrarily precise approximations both for model estimation and imputation
by using randomized quasi-Monte Carlo procedures. The method we provide has
lower errors for the estimated model parameters and the imputed values,
compared to previously proposed methods. We also extend the previous Gaussian
copula models to include unordered multinomial variables in addition to the
present support of ordinal, binary, and continuous variables.
Related papers
- Fusion of Gaussian Processes Predictions with Monte Carlo Sampling [61.31380086717422]
In science and engineering, we often work with models designed for accurate prediction of variables of interest.
Recognizing that these models are approximations of reality, it becomes desirable to apply multiple models to the same data and integrate their outcomes.
arXiv Detail & Related papers (2024-03-03T04:21:21Z) - Probabilistic Unrolling: Scalable, Inverse-Free Maximum Likelihood
Estimation for Latent Gaussian Models [69.22568644711113]
We introduce probabilistic unrolling, a method that combines Monte Carlo sampling with iterative linear solvers to circumvent matrix inversions.
Our theoretical analyses reveal that unrolling and backpropagation through the iterations of the solver can accelerate gradient estimation for maximum likelihood estimation.
In experiments on simulated and real data, we demonstrate that probabilistic unrolling learns latent Gaussian models up to an order of magnitude faster than gradient EM, with minimal losses in model performance.
arXiv Detail & Related papers (2023-06-05T21:08:34Z) - Posterior and Computational Uncertainty in Gaussian Processes [52.26904059556759]
Gaussian processes scale prohibitively with the size of the dataset.
Many approximation methods have been developed, which inevitably introduce approximation error.
This additional source of uncertainty, due to limited computation, is entirely ignored when using the approximate posterior.
We develop a new class of methods that provides consistent estimation of the combined uncertainty arising from both the finite number of data observed and the finite amount of computation expended.
arXiv Detail & Related papers (2022-05-30T22:16:25Z) - Scalable mixed-domain Gaussian process modeling and model reduction for longitudinal data [5.00301731167245]
We derive a basis function approximation scheme for mixed-domain covariance functions.
We show that we can approximate the exact GP model accurately in a fraction of the runtime.
We also demonstrate a scalable model reduction workflow for obtaining smaller and more interpretable models.
arXiv Detail & Related papers (2021-11-03T04:47:37Z) - Latent Gaussian Model Boosting [0.0]
Tree-boosting shows excellent predictive accuracy on many data sets.
We obtain increased predictive accuracy compared to existing approaches in both simulated and real-world data experiments.
arXiv Detail & Related papers (2021-05-19T07:36:30Z) - Autoregressive Score Matching [113.4502004812927]
We propose autoregressive conditional score models (AR-CSM) where we parameterize the joint distribution in terms of the derivatives of univariable log-conditionals (scores)
For AR-CSM models, this divergence between data and model distributions can be computed and optimized efficiently, requiring no expensive sampling or adversarial training.
We show with extensive experimental results that it can be applied to density estimation on synthetic data, image generation, image denoising, and training latent variable models with implicit encoders.
arXiv Detail & Related papers (2020-10-24T07:01:24Z) - Inference in Stochastic Epidemic Models via Multinomial Approximations [2.28438857884398]
We introduce a new method for inference in epidemic models.
The method is applicable to a class of discrete-time, finite-population compartmental models.
We show how the method can be embedded within a Sequential Monte Carlo approach to estimating the time-varying reproduction number of COVID-19 in Wuhan, China.
arXiv Detail & Related papers (2020-06-24T13:08:56Z) - Good Classifiers are Abundant in the Interpolating Regime [64.72044662855612]
We develop a methodology to compute precisely the full distribution of test errors among interpolating classifiers.
We find that test errors tend to concentrate around a small typical value $varepsilon*$, which deviates substantially from the test error of worst-case interpolating model.
Our results show that the usual style of analysis in statistical learning theory may not be fine-grained enough to capture the good generalization performance observed in practice.
arXiv Detail & Related papers (2020-06-22T21:12:31Z) - Mean-Field Approximation to Gaussian-Softmax Integral with Application
to Uncertainty Estimation [23.38076756988258]
We propose a new single-model based approach to quantify uncertainty in deep neural networks.
We use a mean-field approximation formula to compute an analytically intractable integral.
Empirically, the proposed approach performs competitively when compared to state-of-the-art methods.
arXiv Detail & Related papers (2020-06-13T07:32:38Z) - Efficient Ensemble Model Generation for Uncertainty Estimation with
Bayesian Approximation in Segmentation [74.06904875527556]
We propose a generic and efficient segmentation framework to construct ensemble segmentation models.
In the proposed method, ensemble models can be efficiently generated by using the layer selection method.
We also devise a new pixel-wise uncertainty loss, which improves the predictive performance.
arXiv Detail & Related papers (2020-05-21T16:08:38Z) - Gaussian Process Boosting [13.162429430481982]
We introduce a novel way to combine boosting with Gaussian process and mixed effects models.
We obtain increased prediction accuracy compared to existing approaches on simulated and real-world data sets.
arXiv Detail & Related papers (2020-04-06T13:19:54Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.