Universal Regular Conditional Distributions via Probability
Measure-Valued Deep Neural Models
- URL: http://arxiv.org/abs/2105.07743v1
- Date: Mon, 17 May 2021 11:34:09 GMT
- Title: Universal Regular Conditional Distributions via Probability
Measure-Valued Deep Neural Models
- Authors: Anastasis Kratsios
- Abstract summary: We find that any model built using the proposed framework is dense in the space $C(mathcalX,mathcalP_1(mathcalY))$.
The proposed models are also shown to be capable of generically expressing the aleatoric uncertainty present in most randomized machine learning models.
- Score: 3.8073142980733
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This paper introduces a general framework for explicitly constructing
universal deep neural models with inputs from a complete, separable, and
locally-compact metric space $\mathcal{X}$ and outputs in the Wasserstein-1
$\mathcal{P}_1(\mathcal{Y})$ space over a complete and separable metric space
$\mathcal{Y}$. We find that any model built using the proposed framework is
dense in the space $C(\mathcal{X},\mathcal{P}_1(\mathcal{Y}))$ of continuous
functions from $\mathcal{X}$ to $\mathcal{P}_1(\mathcal{Y})$ in the
corresponding uniform convergence on compacts topology, quantitatively. We
identify two methods in which the curse of dimensionality can be broken. The
first approach constructs subsets of
$C(\mathcal{X},\mathcal{P}_1(\mathcal{Y}))$ consisting of functions that can be
efficiently approximated. In the second approach, given any fixed $f \in
C(\mathcal{X},\mathcal{P}_1(\mathcal{Y}))$, we build non-trivial subsets of
$\mathcal{X}$ on which $f$ can be efficiently approximated. The results are
applied to three open problems lying at the interface of applied probability
and computational learning theory. We find that the proposed models can
approximate any regular conditional distribution of a $\mathcal{Y}$-valued
random element $Y$ depending on an $\mathcal{X}$-valued random element $X$,
with arbitrarily high probability. The proposed models are also shown to be
capable of generically expressing the aleatoric uncertainty present in most
randomized machine learning models. The proposed framework is used to derive an
affirmative answer to the open conjecture of Bishop (1994); namely: mixture
density networks are generic regular conditional distributions. Numerical
experiments are performed in the contexts of extreme learning machines,
randomized DNNs, and heteroscedastic regression.
Related papers
- A Theory of Interpretable Approximations [61.90216959710842]
We study the idea of approximating a target concept $c$ by a small aggregation of concepts from some base class $mathcalH$.
For any given pair of $mathcalH$ and $c$, exactly one of these cases holds: (i) $c$ cannot be approximated by $mathcalH$ with arbitrary accuracy.
We show that, in the case of interpretable approximations, even a slightly nontrivial a-priori guarantee on the complexity of approximations implies approximations with constant (distribution-free and accuracy-
arXiv Detail & Related papers (2024-06-15T06:43:45Z) - A Unified Framework for Uniform Signal Recovery in Nonlinear Generative
Compressed Sensing [68.80803866919123]
Under nonlinear measurements, most prior results are non-uniform, i.e., they hold with high probability for a fixed $mathbfx*$ rather than for all $mathbfx*$ simultaneously.
Our framework accommodates GCS with 1-bit/uniformly quantized observations and single index models as canonical examples.
We also develop a concentration inequality that produces tighter bounds for product processes whose index sets have low metric entropy.
arXiv Detail & Related papers (2023-09-25T17:54:19Z) - An Approximation Theory for Metric Space-Valued Functions With A View
Towards Deep Learning [25.25903127886586]
We build universal functions approximators of continuous maps between arbitrary Polish metric spaces $mathcalX$ and $mathcalY$.
In particular, we show that the required number of Dirac measures is determined by the structure of $mathcalX$ and $mathcalY$.
arXiv Detail & Related papers (2023-04-24T16:18:22Z) - On the Multidimensional Random Subset Sum Problem [0.9007371440329465]
In the Random Subset Sum Problem, given $n$ i.i.d. random variables $X_1,..., X_n$, we wish to approximate any point $z in [-1,1]$ as the sum of a subset $X_i_1(z),..., X_i_s(z)$ of them, up to error $varepsilon cdot.
We prove that, in $d$ dimensions, $n = O(d3log frac 1varepsilon cdot
arXiv Detail & Related papers (2022-07-28T08:10:43Z) - Learning a Single Neuron with Adversarial Label Noise via Gradient
Descent [50.659479930171585]
We study a function of the form $mathbfxmapstosigma(mathbfwcdotmathbfx)$ for monotone activations.
The goal of the learner is to output a hypothesis vector $mathbfw$ that $F(mathbbw)=C, epsilon$ with high probability.
arXiv Detail & Related papers (2022-06-17T17:55:43Z) - Metric Hypertransformers are Universal Adapted Maps [4.83420384410068]
metric hypertransformers (MHTs) are capable of approxing any adapted map $F:mathscrXmathbbZrightarrow mathscrYmathbbZ$ with approximable complexity.
Our results provide the first (quantimat) universal approximation theorem compatible with any such $mathscrX$ and $mathscrY$.
arXiv Detail & Related papers (2022-01-31T10:03:46Z) - Random matrices in service of ML footprint: ternary random features with
no performance loss [55.30329197651178]
We show that the eigenspectrum of $bf K$ is independent of the distribution of the i.i.d. entries of $bf w$.
We propose a novel random technique, called Ternary Random Feature (TRF)
The computation of the proposed random features requires no multiplication and a factor of $b$ less bits for storage compared to classical random features.
arXiv Detail & Related papers (2021-10-05T09:33:49Z) - Model Selection with Near Optimal Rates for Reinforcement Learning with
General Model Classes [27.361399036211694]
We address the problem of model selection for the finite horizon episodic Reinforcement Learning (RL) problem.
In the model selection framework, instead of $mathcalP*$, we are given $M$ nested families of transition kernels.
We show that textttARL-GEN obtains a regret of $TildemathcalO(d_mathcalE*H2+sqrtd_mathcalE* mathbbM* H2 T)$
arXiv Detail & Related papers (2021-07-13T05:00:38Z) - Convergence of Sparse Variational Inference in Gaussian Processes
Regression [29.636483122130027]
We show that a method with an overall computational cost of $mathcalO(log N)2D(loglog N)2)$ can be used to perform inference.
arXiv Detail & Related papers (2020-08-01T19:23:34Z) - Linear Time Sinkhorn Divergences using Positive Features [51.50788603386766]
Solving optimal transport with an entropic regularization requires computing a $ntimes n$ kernel matrix that is repeatedly applied to a vector.
We propose to use instead ground costs of the form $c(x,y)=-logdotpvarphi(x)varphi(y)$ where $varphi$ is a map from the ground space onto the positive orthant $RRr_+$, with $rll n$.
arXiv Detail & Related papers (2020-06-12T10:21:40Z) - Agnostic Learning of a Single Neuron with Gradient Descent [92.7662890047311]
We consider the problem of learning the best-fitting single neuron as measured by the expected square loss.
For the ReLU activation, our population risk guarantee is $O(mathsfOPT1/2)+epsilon$.
For the ReLU activation, our population risk guarantee is $O(mathsfOPT1/2)+epsilon$.
arXiv Detail & Related papers (2020-05-29T07:20:35Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.