Related papers: Universal Regular Conditional Distributions via Probability Measure-Valued Deep Neural Models

Universal Regular Conditional Distributions via Probability Measure-Valued Deep Neural Models

URL: http://arxiv.org/abs/2105.07743v1
Date: Mon, 17 May 2021 11:34:09 GMT
Title: Universal Regular Conditional Distributions via Probability Measure-Valued Deep Neural Models
Authors: Anastasis Kratsios
Abstract summary: We find that any model built using the proposed framework is dense in the space $C(mathcalX,mathcalP_1(mathcalY))$. The proposed models are also shown to be capable of generically expressing the aleatoric uncertainty present in most randomized machine learning models.
Score: 3.8073142980733
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: This paper introduces a general framework for explicitly constructing universal deep neural models with inputs from a complete, separable, and locally-compact metric space $\mathcal{X}$ and outputs in the Wasserstein-1 $\mathcal{P}_1(\mathcal{Y})$ space over a complete and separable metric space $\mathcal{Y}$. We find that any model built using the proposed framework is dense in the space $C(\mathcal{X},\mathcal{P}_1(\mathcal{Y}))$ of continuous functions from $\mathcal{X}$ to $\mathcal{P}_1(\mathcal{Y})$ in the corresponding uniform convergence on compacts topology, quantitatively. We identify two methods in which the curse of dimensionality can be broken. The first approach constructs subsets of $C(\mathcal{X},\mathcal{P}_1(\mathcal{Y}))$ consisting of functions that can be efficiently approximated. In the second approach, given any fixed $f \in C(\mathcal{X},\mathcal{P}_1(\mathcal{Y}))$, we build non-trivial subsets of $\mathcal{X}$ on which $f$ can be efficiently approximated. The results are applied to three open problems lying at the interface of applied probability and computational learning theory. We find that the proposed models can approximate any regular conditional distribution of a $\mathcal{Y}$-valued random element $Y$ depending on an $\mathcal{X}$-valued random element $X$, with arbitrarily high probability. The proposed models are also shown to be capable of generically expressing the aleatoric uncertainty present in most randomized machine learning models. The proposed framework is used to derive an affirmative answer to the open conjecture of Bishop (1994); namely: mixture density networks are generic regular conditional distributions. Numerical experiments are performed in the contexts of extreme learning machines, randomized DNNs, and heteroscedastic regression.

Related papers

Bridging the Gap Between Approximation and Learning via Optimal Approximation by ReLU MLPs of Maximal Regularity [8.28720658988688]
We show that any $(L,alpha)$-H"older function can be approximated to a uniform $mathcalO(dnd/alpha)$, of width $mathcalO(dnd/alpha)$, depth $mathcalO(log(d))$, with $mathcalO(dnd/alpha)$ and $mathcalO(dnd/alpha)$, with $mathcal
arXiv Detail & Related papers (2024-09-18T22:05:07Z)
A Theory of Interpretable Approximations [61.90216959710842]
We study the idea of approximating a target concept $c$ by a small aggregation of concepts from some base class $mathcalH$. For any given pair of $mathcalH$ and $c$, exactly one of these cases holds: (i) $c$ cannot be approximated by $mathcalH$ with arbitrary accuracy. We show that, in the case of interpretable approximations, even a slightly nontrivial a-priori guarantee on the complexity of approximations implies approximations with constant (distribution-free and accuracy-
arXiv Detail & Related papers (2024-06-15T06:43:45Z)
A Unified Framework for Uniform Signal Recovery in Nonlinear Generative Compressed Sensing [68.80803866919123]
Under nonlinear measurements, most prior results are non-uniform, i.e., they hold with high probability for a fixed $mathbfx*$ rather than for all $mathbfx*$ simultaneously. Our framework accommodates GCS with 1-bit/uniformly quantized observations and single index models as canonical examples. We also develop a concentration inequality that produces tighter bounds for product processes whose index sets have low metric entropy.
arXiv Detail & Related papers (2023-09-25T17:54:19Z)
An Approximation Theory for Metric Space-Valued Functions With A View Towards Deep Learning [25.25903127886586]
We build universal functions approximators of continuous maps between arbitrary Polish metric spaces $mathcalX$ and $mathcalY$. In particular, we show that the required number of Dirac measures is determined by the structure of $mathcalX$ and $mathcalY$.
arXiv Detail & Related papers (2023-04-24T16:18:22Z)
On the Multidimensional Random Subset Sum Problem [0.9007371440329465]
In the Random Subset Sum Problem, given $n$ i.i.d. random variables $X_1,..., X_n$, we wish to approximate any point $z in [-1,1]$ as the sum of a subset $X_i_1(z),..., X_i_s(z)$ of them, up to error $varepsilon cdot. We prove that, in $d$ dimensions, $n = O(d3log frac 1varepsilon cdot
arXiv Detail & Related papers (2022-07-28T08:10:43Z)
Learning a Single Neuron with Adversarial Label Noise via Gradient Descent [50.659479930171585]
We study a function of the form $mathbfxmapstosigma(mathbfwcdotmathbfx)$ for monotone activations. The goal of the learner is to output a hypothesis vector $mathbfw$ that $F(mathbbw)=C, epsilon$ with high probability.
arXiv Detail & Related papers (2022-06-17T17:55:43Z)
Metric Hypertransformers are Universal Adapted Maps [4.83420384410068]
metric hypertransformers (MHTs) are capable of approxing any adapted map $F:mathscrXmathbbZrightarrow mathscrYmathbbZ$ with approximable complexity. Our results provide the first (quantimat) universal approximation theorem compatible with any such $mathscrX$ and $mathscrY$.
arXiv Detail & Related papers (2022-01-31T10:03:46Z)
Random matrices in service of ML footprint: ternary random features with no performance loss [55.30329197651178]
We show that the eigenspectrum of $bf K$ is independent of the distribution of the i.i.d. entries of $bf w$. We propose a novel random technique, called Ternary Random Feature (TRF) The computation of the proposed random features requires no multiplication and a factor of $b$ less bits for storage compared to classical random features.
arXiv Detail & Related papers (2021-10-05T09:33:49Z)
Model Selection with Near Optimal Rates for Reinforcement Learning with General Model Classes [27.361399036211694]
We address the problem of model selection for the finite horizon episodic Reinforcement Learning (RL) problem. In the model selection framework, instead of $mathcalP*$, we are given $M$ nested families of transition kernels. We show that textttARL-GEN obtains a regret of $TildemathcalO(d_mathcalE*H2+sqrtd_mathcalE* mathbbM* H2 T)$
arXiv Detail & Related papers (2021-07-13T05:00:38Z)
Small Covers for Near-Zero Sets of Polynomials and Learning Latent Variable Models [56.98280399449707]
We show that there exists an $epsilon$-cover for $S$ of cardinality $M = (k/epsilon)O_d(k1/d)$. Building on our structural result, we obtain significantly improved learning algorithms for several fundamental high-dimensional probabilistic models hidden variables.
arXiv Detail & Related papers (2020-12-14T18:14:08Z)
Convergence of Sparse Variational Inference in Gaussian Processes Regression [29.636483122130027]
We show that a method with an overall computational cost of $mathcalO(log N)2D(loglog N)2)$ can be used to perform inference.
arXiv Detail & Related papers (2020-08-01T19:23:34Z)
Linear Time Sinkhorn Divergences using Positive Features [51.50788603386766]
Solving optimal transport with an entropic regularization requires computing a $ntimes n$ kernel matrix that is repeatedly applied to a vector. We propose to use instead ground costs of the form $c(x,y)=-logdotpvarphi(x)varphi(y)$ where $varphi$ is a map from the ground space onto the positive orthant $RRr_+$, with $rll n$.
arXiv Detail & Related papers (2020-06-12T10:21:40Z)
Agnostic Learning of a Single Neuron with Gradient Descent [92.7662890047311]
We consider the problem of learning the best-fitting single neuron as measured by the expected square loss. For the ReLU activation, our population risk guarantee is $O(mathsfOPT1/2)+epsilon$. For the ReLU activation, our population risk guarantee is $O(mathsfOPT1/2)+epsilon$.
arXiv Detail & Related papers (2020-05-29T07:20:35Z)

This list is automatically generated from the titles and abstracts of the papers in this site.