Related papers: kNN Algorithm for Conditional Mean and Variance Estimation with Automated Uncertainty Quantification and Variable Selection

kNN Algorithm for Conditional Mean and Variance Estimation with Automated Uncertainty Quantification and Variable Selection

URL: http://arxiv.org/abs/2402.01635v1
Date: Fri, 2 Feb 2024 18:54:18 GMT
Title: kNN Algorithm for Conditional Mean and Variance Estimation with Automated Uncertainty Quantification and Variable Selection
Authors: Marcos Matabuena, Juan C. Vidal, Oscar Hernan Madrid Padilla, Jukka-Pekka Onnela
Abstract summary: We introduce a kNN-based regression method that synergizes the scalability and adaptability of traditional non-parametric kNN models. This method focuses on accurately estimating the conditional mean and variance of random response variables. It is particularly notable in biomedical applications as demonstrated in two case studies.
Score: 8.429136647141487
License: http://creativecommons.org/licenses/by/4.0/
Abstract: In this paper, we introduce a kNN-based regression method that synergizes the scalability and adaptability of traditional non-parametric kNN models with a novel variable selection technique. This method focuses on accurately estimating the conditional mean and variance of random response variables, thereby effectively characterizing conditional distributions across diverse scenarios.Our approach incorporates a robust uncertainty quantification mechanism, leveraging our prior estimation work on conditional mean and variance. The employment of kNN ensures scalable computational efficiency in predicting intervals and statistical accuracy in line with optimal non-parametric rates. Additionally, we introduce a new kNN semi-parametric algorithm for estimating ROC curves, accounting for covariates. For selecting the smoothing parameter k, we propose an algorithm with theoretical guarantees.Incorporation of variable selection enhances the performance of the method significantly over conventional kNN techniques in various modeling tasks. We validate the approach through simulations in low, moderate, and high-dimensional covariate spaces. The algorithm's effectiveness is particularly notable in biomedical applications as demonstrated in two case studies. Concluding with a theoretical analysis, we highlight the consistency and convergence rate of our method over traditional kNN models, particularly when the underlying regression model takes values in a low-dimensional space.

Related papers

Stochastic Optimization with Optimal Importance Sampling [49.484190237840714]
We propose an iterative-based algorithm that jointly updates the decision and the IS distribution without requiring time-scale separation between the two. Our method achieves the lowest possible variable variance and guarantees global convergence under convexity of the objective and mild assumptions on the IS distribution family.
arXiv Detail & Related papers (2025-04-04T16:10:18Z)
Average-Over-Time Spiking Neural Networks for Uncertainty Estimation in Regression [3.409728296852651]
We introduce two methods that adapt the Average-Over-Time Spiking Neural Network (AOT-SNN) framework to regression tasks. We evaluate our approaches on both a toy dataset and several benchmark datasets.
arXiv Detail & Related papers (2024-11-29T23:13:52Z)
Variational Bayesian surrogate modelling with application to robust design optimisation [0.9626666671366836]
Surrogate models provide a quick-to-evaluate approximation to complex computational models. We consider Bayesian inference for constructing statistical surrogates with input uncertainties and dimensionality reduction. We demonstrate intrinsic and robust structural optimisation problems where cost functions depend on a weighted sum of the mean and standard deviation of model outputs.
arXiv Detail & Related papers (2024-04-23T09:22:35Z)
Partially factorized variational inference for high-dimensional mixed models [0.0]
Variational inference is a popular way to perform such computations, especially in the Bayesian context. We show that standard mean-field variational inference dramatically underestimates posterior uncertainty in high-dimensions. We then show how appropriately relaxing the mean-field assumption leads to methods whose uncertainty quantification does not deteriorate in high-dimensions.
arXiv Detail & Related papers (2023-12-20T16:12:37Z)
An Optimization-based Deep Equilibrium Model for Hyperspectral Image Deconvolution with Convergence Guarantees [71.57324258813675]
We propose a novel methodology for addressing the hyperspectral image deconvolution problem. A new optimization problem is formulated, leveraging a learnable regularizer in the form of a neural network. The derived iterative solver is then expressed as a fixed-point calculation problem within the Deep Equilibrium framework.
arXiv Detail & Related papers (2023-06-10T08:25:16Z)
Variational Linearized Laplace Approximation for Bayesian Deep Learning [11.22428369342346]
We propose a new method for approximating Linearized Laplace Approximation (LLA) using a variational sparse Gaussian Process (GP) Our method is based on the dual RKHS formulation of GPs and retains, as the predictive mean, the output of the original DNN. It allows for efficient optimization, which results in sub-linear training time in the size of the training dataset.
arXiv Detail & Related papers (2023-02-24T10:32:30Z)
Scalable computation of prediction intervals for neural networks via matrix sketching [79.44177623781043]
Existing algorithms for uncertainty estimation require modifying the model architecture and training procedure. This work proposes a new algorithm that can be applied to a given trained neural network and produces approximate prediction intervals.
arXiv Detail & Related papers (2022-05-06T13:18:31Z)
A Priori Denoising Strategies for Sparse Identification of Nonlinear Dynamical Systems: A Comparative Study [68.8204255655161]
We investigate and compare the performance of several local and global smoothing techniques to a priori denoise the state measurements. We show that, in general, global methods, which use the entire measurement data set, outperform local methods, which employ a neighboring data subset around a local point.
arXiv Detail & Related papers (2022-01-29T23:31:25Z)
High-Dimensional Differentially-Private EM Algorithm: Methods and Near-Optimal Statistical Guarantees [8.089708900273804]
We develop a general framework to design differentially private expectation-maximization (EM) algorithms in high-dimensional latent variable models. In each model, we establish the near-optimal rate of convergence with differential privacy constraints. We propose a near rate-optimal EM algorithm with differential privacy guarantees in this setting.
arXiv Detail & Related papers (2021-04-01T04:08:34Z)
The Variational Method of Moments [65.91730154730905]
conditional moment problem is a powerful formulation for describing structural causal parameters in terms of observables. Motivated by a variational minimax reformulation of OWGMM, we define a very general class of estimators for the conditional moment problem. We provide algorithms for valid statistical inference based on the same kind of variational reformulations.
arXiv Detail & Related papers (2020-12-17T07:21:06Z)
Amortized Conditional Normalized Maximum Likelihood: Reliable Out of Distribution Uncertainty Estimation [99.92568326314667]
We propose the amortized conditional normalized maximum likelihood (ACNML) method as a scalable general-purpose approach for uncertainty estimation. Our algorithm builds on the conditional normalized maximum likelihood (CNML) coding scheme, which has minimax optimal properties according to the minimum description length principle. We demonstrate that ACNML compares favorably to a number of prior techniques for uncertainty estimation in terms of calibration on out-of-distribution inputs.
arXiv Detail & Related papers (2020-11-05T08:04:34Z)
Statistical optimality and stability of tangent transform algorithms in logit models [6.9827388859232045]
We provide conditions on the data generating process to derive non-asymptotic upper bounds to the risk incurred by the logistical optima. In particular, we establish local variation of the algorithm without any assumptions on the data-generating process. We explore a special case involving a semi-orthogonal design under which a global convergence is obtained.
arXiv Detail & Related papers (2020-10-25T05:15:13Z)
Instability, Computational Efficiency and Statistical Accuracy [101.32305022521024]
We develop a framework that yields statistical accuracy based on interplay between the deterministic convergence rate of the algorithm at the population level, and its degree of (instability) when applied to an empirical object based on $n$ samples. We provide applications of our general results to several concrete classes of models, including Gaussian mixture estimation, non-linear regression models, and informative non-response models.
arXiv Detail & Related papers (2020-05-22T22:30:52Z)
Efficient Uncertainty Quantification for Dynamic Subsurface Flow with Surrogate by Theory-guided Neural Network [0.0]
We propose a methodology for efficient uncertainty quantification for dynamic subsurface flow with a surrogate constructed by the Theory-guided Neural Network (TgNN) parameters, time and location comprise the input of the neural network, while the quantity of interest is the output. The trained neural network can predict solutions of subsurface flow problems with new parameters.
arXiv Detail & Related papers (2020-04-25T12:41:57Z)

This list is automatically generated from the titles and abstracts of the papers in this site.