kNN Algorithm for Conditional Mean and Variance Estimation with
Automated Uncertainty Quantification and Variable Selection
- URL: http://arxiv.org/abs/2402.01635v1
- Date: Fri, 2 Feb 2024 18:54:18 GMT
- Title: kNN Algorithm for Conditional Mean and Variance Estimation with
Automated Uncertainty Quantification and Variable Selection
- Authors: Marcos Matabuena, Juan C. Vidal, Oscar Hernan Madrid Padilla,
Jukka-Pekka Onnela
- Abstract summary: We introduce a kNN-based regression method that synergizes the scalability and adaptability of traditional non-parametric kNN models.
This method focuses on accurately estimating the conditional mean and variance of random response variables.
It is particularly notable in biomedical applications as demonstrated in two case studies.
- Score: 8.429136647141487
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In this paper, we introduce a kNN-based regression method that synergizes the
scalability and adaptability of traditional non-parametric kNN models with a
novel variable selection technique. This method focuses on accurately
estimating the conditional mean and variance of random response variables,
thereby effectively characterizing conditional distributions across diverse
scenarios.Our approach incorporates a robust uncertainty quantification
mechanism, leveraging our prior estimation work on conditional mean and
variance. The employment of kNN ensures scalable computational efficiency in
predicting intervals and statistical accuracy in line with optimal
non-parametric rates. Additionally, we introduce a new kNN semi-parametric
algorithm for estimating ROC curves, accounting for covariates. For selecting
the smoothing parameter k, we propose an algorithm with theoretical
guarantees.Incorporation of variable selection enhances the performance of the
method significantly over conventional kNN techniques in various modeling
tasks. We validate the approach through simulations in low, moderate, and
high-dimensional covariate spaces. The algorithm's effectiveness is
particularly notable in biomedical applications as demonstrated in two case
studies. Concluding with a theoretical analysis, we highlight the consistency
and convergence rate of our method over traditional kNN models, particularly
when the underlying regression model takes values in a low-dimensional space.
Related papers
- Variational Bayesian surrogate modelling with application to robust design optimisation [0.9626666671366836]
Surrogate models provide a quick-to-evaluate approximation to complex computational models.
We consider Bayesian inference for constructing statistical surrogates with input uncertainties and dimensionality reduction.
We demonstrate intrinsic and robust structural optimisation problems where cost functions depend on a weighted sum of the mean and standard deviation of model outputs.
arXiv Detail & Related papers (2024-04-23T09:22:35Z) - An Optimization-based Deep Equilibrium Model for Hyperspectral Image
Deconvolution with Convergence Guarantees [71.57324258813675]
We propose a novel methodology for addressing the hyperspectral image deconvolution problem.
A new optimization problem is formulated, leveraging a learnable regularizer in the form of a neural network.
The derived iterative solver is then expressed as a fixed-point calculation problem within the Deep Equilibrium framework.
arXiv Detail & Related papers (2023-06-10T08:25:16Z) - Variational Linearized Laplace Approximation for Bayesian Deep Learning [11.22428369342346]
We propose a new method for approximating Linearized Laplace Approximation (LLA) using a variational sparse Gaussian Process (GP)
Our method is based on the dual RKHS formulation of GPs and retains, as the predictive mean, the output of the original DNN.
It allows for efficient optimization, which results in sub-linear training time in the size of the training dataset.
arXiv Detail & Related papers (2023-02-24T10:32:30Z) - Scalable computation of prediction intervals for neural networks via
matrix sketching [79.44177623781043]
Existing algorithms for uncertainty estimation require modifying the model architecture and training procedure.
This work proposes a new algorithm that can be applied to a given trained neural network and produces approximate prediction intervals.
arXiv Detail & Related papers (2022-05-06T13:18:31Z) - A Priori Denoising Strategies for Sparse Identification of Nonlinear
Dynamical Systems: A Comparative Study [68.8204255655161]
We investigate and compare the performance of several local and global smoothing techniques to a priori denoise the state measurements.
We show that, in general, global methods, which use the entire measurement data set, outperform local methods, which employ a neighboring data subset around a local point.
arXiv Detail & Related papers (2022-01-29T23:31:25Z) - High-Dimensional Differentially-Private EM Algorithm: Methods and
Near-Optimal Statistical Guarantees [8.089708900273804]
We develop a general framework to design differentially private expectation-maximization (EM) algorithms in high-dimensional latent variable models.
In each model, we establish the near-optimal rate of convergence with differential privacy constraints.
We propose a near rate-optimal EM algorithm with differential privacy guarantees in this setting.
arXiv Detail & Related papers (2021-04-01T04:08:34Z) - The Variational Method of Moments [65.91730154730905]
conditional moment problem is a powerful formulation for describing structural causal parameters in terms of observables.
Motivated by a variational minimax reformulation of OWGMM, we define a very general class of estimators for the conditional moment problem.
We provide algorithms for valid statistical inference based on the same kind of variational reformulations.
arXiv Detail & Related papers (2020-12-17T07:21:06Z) - Amortized Conditional Normalized Maximum Likelihood: Reliable Out of
Distribution Uncertainty Estimation [99.92568326314667]
We propose the amortized conditional normalized maximum likelihood (ACNML) method as a scalable general-purpose approach for uncertainty estimation.
Our algorithm builds on the conditional normalized maximum likelihood (CNML) coding scheme, which has minimax optimal properties according to the minimum description length principle.
We demonstrate that ACNML compares favorably to a number of prior techniques for uncertainty estimation in terms of calibration on out-of-distribution inputs.
arXiv Detail & Related papers (2020-11-05T08:04:34Z) - Statistical optimality and stability of tangent transform algorithms in
logit models [6.9827388859232045]
We provide conditions on the data generating process to derive non-asymptotic upper bounds to the risk incurred by the logistical optima.
In particular, we establish local variation of the algorithm without any assumptions on the data-generating process.
We explore a special case involving a semi-orthogonal design under which a global convergence is obtained.
arXiv Detail & Related papers (2020-10-25T05:15:13Z) - Instability, Computational Efficiency and Statistical Accuracy [101.32305022521024]
We develop a framework that yields statistical accuracy based on interplay between the deterministic convergence rate of the algorithm at the population level, and its degree of (instability) when applied to an empirical object based on $n$ samples.
We provide applications of our general results to several concrete classes of models, including Gaussian mixture estimation, non-linear regression models, and informative non-response models.
arXiv Detail & Related papers (2020-05-22T22:30:52Z) - Efficient Uncertainty Quantification for Dynamic Subsurface Flow with
Surrogate by Theory-guided Neural Network [0.0]
We propose a methodology for efficient uncertainty quantification for dynamic subsurface flow with a surrogate constructed by the Theory-guided Neural Network (TgNN)
parameters, time and location comprise the input of the neural network, while the quantity of interest is the output.
The trained neural network can predict solutions of subsurface flow problems with new parameters.
arXiv Detail & Related papers (2020-04-25T12:41:57Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.