Probabilistic Iterative Methods for Linear Systems
- URL: http://arxiv.org/abs/2012.12615v2
- Date: Mon, 11 Jan 2021 14:30:53 GMT
- Title: Probabilistic Iterative Methods for Linear Systems
- Authors: Jon Cockayne and Ilse C.F. Ipsen and Chris J. Oates and Tim W. Reid
- Abstract summary: We show that a standard iterative method on $mathbbRd$ is lifted to act on the space of probability distributions $mathcalP(mathbbRd)$.
The output of the iterative methods proposed in this paper is, instead, a sequence of probability distributions $mu_m in mathcalP(mathbbRd)$.
- Score: 5.457842083043013
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: This paper presents a probabilistic perspective on iterative methods for
approximating the solution $\mathbf{x}_* \in \mathbb{R}^d$ of a nonsingular
linear system $\mathbf{A} \mathbf{x}_* = \mathbf{b}$. In the approach a
standard iterative method on $\mathbb{R}^d$ is lifted to act on the space of
probability distributions $\mathcal{P}(\mathbb{R}^d)$. Classically, an
iterative method produces a sequence $\mathbf{x}_m$ of approximations that
converge to $\mathbf{x}_*$. The output of the iterative methods proposed in
this paper is, instead, a sequence of probability distributions $\mu_m \in
\mathcal{P}(\mathbb{R}^d)$. The distributional output both provides a "best
guess" for $\mathbf{x}_*$, for example as the mean of $\mu_m$, and also
probabilistic uncertainty quantification for the value of $\mathbf{x}_*$ when
it has not been exactly determined. Theoretical analysis is provided in the
prototypical case of a stationary linear iterative method. In this setting we
characterise both the rate of contraction of $\mu_m$ to an atomic measure on
$\mathbf{x}_*$ and the nature of the uncertainty quantification being provided.
We conclude with an empirical illustration that highlights the insight into
solution uncertainty that can be provided by probabilistic iterative methods.
Related papers
- Which exceptional low-dimensional projections of a Gaussian point cloud can be found in polynomial time? [8.74634652691576]
We study the subset $mathscrF_m,alpha$ of distributions that can be realized by a class of iterative algorithms.
Non-rigorous methods from statistical physics yield an indirect characterization of $mathscrF_m,alpha$ in terms of a generalized Parisi formula.
arXiv Detail & Related papers (2024-06-05T05:54:56Z) - Provably learning a multi-head attention layer [55.2904547651831]
Multi-head attention layer is one of the key components of the transformer architecture that sets it apart from traditional feed-forward models.
In this work, we initiate the study of provably learning a multi-head attention layer from random examples.
We prove computational lower bounds showing that in the worst case, exponential dependence on $m$ is unavoidable.
arXiv Detail & Related papers (2024-02-06T15:39:09Z) - Optimal Estimator for Linear Regression with Shuffled Labels [17.99906229036223]
This paper considers the task of linear regression with shuffled labels.
$mathbf Y in mathbb Rntimes m, mathbf Pi in mathbb Rntimes p, mathbf B in mathbb Rptimes m$, and $mathbf Win mathbb Rntimes m$, respectively.
arXiv Detail & Related papers (2023-10-02T16:44:47Z) - Misspecified Phase Retrieval with Generative Priors [15.134280834597865]
We estimate an $n$-dimensional signal $mathbfx$ from $m$ i.i.d.realizations of the single index model $y.
We show that both steps enjoy a statistical rate of order $sqrt(klog L)cdot (log m)/m$ under suitable conditions.
arXiv Detail & Related papers (2022-10-11T16:04:11Z) - Learning a Single Neuron with Adversarial Label Noise via Gradient
Descent [50.659479930171585]
We study a function of the form $mathbfxmapstosigma(mathbfwcdotmathbfx)$ for monotone activations.
The goal of the learner is to output a hypothesis vector $mathbfw$ that $F(mathbbw)=C, epsilon$ with high probability.
arXiv Detail & Related papers (2022-06-17T17:55:43Z) - Random matrices in service of ML footprint: ternary random features with
no performance loss [55.30329197651178]
We show that the eigenspectrum of $bf K$ is independent of the distribution of the i.i.d. entries of $bf w$.
We propose a novel random technique, called Ternary Random Feature (TRF)
The computation of the proposed random features requires no multiplication and a factor of $b$ less bits for storage compared to classical random features.
arXiv Detail & Related papers (2021-10-05T09:33:49Z) - Spectral properties of sample covariance matrices arising from random
matrices with independent non identically distributed columns [50.053491972003656]
It was previously shown that the functionals $texttr(AR(z))$, for $R(z) = (frac1nXXT- zI_p)-1$ and $Ain mathcal M_p$ deterministic, have a standard deviation of order $O(|A|_* / sqrt n)$.
Here, we show that $|mathbb E[R(z)] - tilde R(z)|_F
arXiv Detail & Related papers (2021-09-06T14:21:43Z) - Universal Regular Conditional Distributions via Probability
Measure-Valued Deep Neural Models [3.8073142980733]
We find that any model built using the proposed framework is dense in the space $C(mathcalX,mathcalP_1(mathcalY))$.
The proposed models are also shown to be capable of generically expressing the aleatoric uncertainty present in most randomized machine learning models.
arXiv Detail & Related papers (2021-05-17T11:34:09Z) - Optimal Mean Estimation without a Variance [103.26777953032537]
We study the problem of heavy-tailed mean estimation in settings where the variance of the data-generating distribution does not exist.
We design an estimator which attains the smallest possible confidence interval as a function of $n,d,delta$.
arXiv Detail & Related papers (2020-11-24T22:39:21Z) - Near-Optimal SQ Lower Bounds for Agnostically Learning Halfspaces and
ReLUs under Gaussian Marginals [49.60752558064027]
We study the fundamental problems of agnostically learning halfspaces and ReLUs under Gaussian marginals.
Our lower bounds provide strong evidence that current upper bounds for these tasks are essentially best possible.
arXiv Detail & Related papers (2020-06-29T17:10:10Z) - Moving Target Monte Carlo [0.0]
We introduce a new non-Markovian sampling algorithm called Moving Target Monte Carlo (MTMC)
The acceptance rate at $n$-th iteration is constructed using an iteratively updated approximation of the posterior distribution $a_n(mathbfx)$.
The true value of the posterior $p(mathbfx|mathbfd)$ is only calculated if the candidate $mathbfx$ is accepted.
arXiv Detail & Related papers (2020-03-10T17:38:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.