Related papers: A Data-Adaptive Prior for Bayesian Learning of Kernels in Operators

A Data-Adaptive Prior for Bayesian Learning of Kernels in Operators

URL: http://arxiv.org/abs/2212.14163v2
Date: Fri, 18 Oct 2024 03:06:45 GMT
Title: A Data-Adaptive Prior for Bayesian Learning of Kernels in Operators
Authors: Neil K. Chada, Quanjun Lang, Fei Lu, Xiong Wang,
Abstract summary: We introduce a data-adaptive prior to achieve a stable posterior whose mean always has a small noise limit. Numerical tests show that a fixed prior can lead to a divergent posterior mean in the presence of any of the four types of errors.
Score: 4.09465251504657
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: Kernels are efficient in representing nonlocal dependence and they are widely used to design operators between function spaces. Thus, learning kernels in operators from data is an inverse problem of general interest. Due to the nonlocal dependence, the inverse problem can be severely ill-posed with a data-dependent singular inversion operator. The Bayesian approach overcomes the ill-posedness through a non-degenerate prior. However, a fixed non-degenerate prior leads to a divergent posterior mean when the observation noise becomes small, if the data induces a perturbation in the eigenspace of zero eigenvalues of the inversion operator. We introduce a data-adaptive prior to achieve a stable posterior whose mean always has a small noise limit. The data-adaptive prior's covariance is the inversion operator with a hyper-parameter selected adaptive to data by the L-curve method. Furthermore, we provide a detailed analysis on the computational practice of the data-adaptive prior, and demonstrate it on Toeplitz matrices and integral operators. Numerical tests show that a fixed prior can lead to a divergent posterior mean in the presence of any of the four types of errors: discretization error, model error, partial observation and wrong noise assumption. In contrast, the data-adaptive prior always attains posterior means with small noise limits.

Related papers

Optimal Implicit Bias in Linear Regression [20.710343135282116]
We find the optimal implicit bias that leads to the best generalization performance.<n>In particular, we obtain a tight lower bound on the best generalization error possible among this class of interpolators.
arXiv Detail & Related papers (2025-06-20T17:41:39Z)
Accelerated zero-order SGD under high-order smoothness and overparameterized regime [79.85163929026146]
We present a novel gradient-free algorithm to solve convex optimization problems. Such problems are encountered in medicine, physics, and machine learning. We provide convergence guarantees for the proposed algorithm under both types of noise.
arXiv Detail & Related papers (2024-11-21T10:26:17Z)
Error Feedback under $(L_0,L_1)$-Smoothness: Normalization and Momentum [56.37522020675243]
We provide the first proof of convergence for normalized error feedback algorithms across a wide range of machine learning problems. We show that due to their larger allowable stepsizes, our new normalized error feedback algorithms outperform their non-normalized counterparts on various tasks.
arXiv Detail & Related papers (2024-10-22T10:19:27Z)
Efficient Prior Calibration From Indirect Data [5.588334720483076]
This paper is concerned with learning the prior model from data, in particular, learning the prior from multiple realizations of indirect data obtained through the noisy observation process. An efficient residual-based neural operator approximation of the forward model is proposed and it is shown that this may be learned concurrently with the pushforward map.
arXiv Detail & Related papers (2024-05-28T08:34:41Z)
Understanding Optimal Feature Transfer via a Fine-Grained Bias-Variance Analysis [10.79615566320291]
We explore transfer learning with the goal of optimizing downstream performance. We introduce a simple linear model that takes as input an arbitrary pretrained feature. We identify the optimal pretrained representation by minimizing the downstream risk averaged over an ensemble of downstream tasks.
arXiv Detail & Related papers (2024-04-18T19:33:55Z)
Optimal Algorithms for the Inhomogeneous Spiked Wigner Model [89.1371983413931]
We derive an approximate message-passing algorithm (AMP) for the inhomogeneous problem. We identify in particular the existence of a statistical-to-computational gap where known algorithms require a signal-to-noise ratio bigger than the information-theoretic threshold to perform better than random.
arXiv Detail & Related papers (2023-02-13T19:57:17Z)
GibbsDDRM: A Partially Collapsed Gibbs Sampler for Solving Blind Inverse Problems with Denoising Diffusion Restoration [64.8770356696056]
We propose GibbsDDRM, an extension of Denoising Diffusion Restoration Models (DDRM) to a blind setting in which the linear measurement operator is unknown. The proposed method is problem-agnostic, meaning that a pre-trained diffusion model can be applied to various inverse problems without fine-tuning.
arXiv Detail & Related papers (2023-01-30T06:27:48Z)
On the Inconsistency of Kernel Ridgeless Regression in Fixed Dimensions [16.704246627541103]
We show that an important class of predictors, kernel machines with translation-invariant kernels, does not exhibit benign overfitting in fixed dimensions. Our results apply to commonly used translation-invariant kernels such as Gaussian, Laplace, and Cauchy.
arXiv Detail & Related papers (2022-05-26T17:43:20Z)
Data adaptive RKHS Tikhonov regularization for learning kernels in operators [1.5039745292757671]
We present DARTR: a Data Adaptive RKHS Tikhonov Regularization method for the linear inverse problem of nonparametric learning of function parameters in operators. A key ingredient is a system intrinsic data-adaptive (SIDA) RKHS, whose norm restricts the learning to take place in the function space of identifiability.
arXiv Detail & Related papers (2022-03-08T01:08:35Z)
Convergence Rates for Learning Linear Operators from Noisy Data [6.4423565043274795]
We study the inverse problem of learning a linear operator on a space from its noisy pointwise evaluations on random input data. We establish posterior contraction rates with respect to a family of Bochner norms as the number of data tend to infinity lower on the estimation error. These convergence rates highlight and quantify the difficulty of learning linear operators in comparison with the learning of bounded or compact ones.
arXiv Detail & Related papers (2021-08-27T22:09:53Z)
Imputation-Free Learning from Incomplete Observations [73.15386629370111]
We introduce the importance of guided gradient descent (IGSGD) method to train inference from inputs containing missing values without imputation. We employ reinforcement learning (RL) to adjust the gradients used to train the models via back-propagation. Our imputation-free predictions outperform the traditional two-step imputation-based predictions using state-of-the-art imputation methods.
arXiv Detail & Related papers (2021-07-05T12:44:39Z)
Benign Overfitting of Constant-Stepsize SGD for Linear Regression [122.70478935214128]
inductive biases are central in preventing overfitting empirically. This work considers this issue in arguably the most basic setting: constant-stepsize SGD for linear regression. We reflect on a number of notable differences between the algorithmic regularization afforded by (unregularized) SGD in comparison to ordinary least squares.
arXiv Detail & Related papers (2021-03-23T17:15:53Z)

This list is automatically generated from the titles and abstracts of the papers in this site.