Capacity Bounds for Hyperbolic Neural Network Representations of Latent
Tree Structures
- URL: http://arxiv.org/abs/2308.09250v1
- Date: Fri, 18 Aug 2023 02:24:32 GMT
- Title: Capacity Bounds for Hyperbolic Neural Network Representations of Latent
Tree Structures
- Authors: Anastasis Kratsios, Ruiyang Hong, Haitz S\'aez de Oc\'ariz Borde
- Abstract summary: We study the representation capacity of deep hyperbolic neural networks (HNNs) with a ReLU activation function.
We establish the first proof that HNNs can embed any finite weighted tree into a hyperbolic space of dimensiond$ at least equal to $2$.
We find that the network complexity of HNN implementing the graph representation is independent of the representation fidelity/distortion.
- Score: 8.28720658988688
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We study the representation capacity of deep hyperbolic neural networks
(HNNs) with a ReLU activation function. We establish the first proof that HNNs
can $\varepsilon$-isometrically embed any finite weighted tree into a
hyperbolic space of dimension $d$ at least equal to $2$ with prescribed
sectional curvature $\kappa<0$, for any $\varepsilon> 1$ (where $\varepsilon=1$
being optimal). We establish rigorous upper bounds for the network complexity
on an HNN implementing the embedding. We find that the network complexity of
HNN implementing the graph representation is independent of the representation
fidelity/distortion. We contrast this result against our lower bounds on
distortion which any ReLU multi-layer perceptron (MLP) must exert when
embedding a tree with $L>2^d$ leaves into a $d$-dimensional Euclidean space,
which we show at least $\Omega(L^{1/d})$; independently of the depth, width,
and (possibly discontinuous) activation function defining the MLP.
Related papers
- New advances in universal approximation with neural networks of minimal width [4.424170214926035]
We show that autoencoders with leaky ReLU activations are universal approximators of $Lp$ functions.
We broaden our results to show that smooth invertible neural networks can approximate $Lp(mathbbRd,mathbbRd)$ on compacta.
arXiv Detail & Related papers (2024-11-13T16:17:16Z) - Deep Neural Networks: Multi-Classification and Universal Approximation [0.0]
We demonstrate that a ReLU deep neural network with a width of $2$ and a depth of $2N+4M-1$ layers can achieve finite sample memorization for any dataset comprising $N$ elements.
We also provide depth estimates for approximating $W1,p$ functions and width estimates for approximating $Lp(Omega;mathbbRm)$ for $mgeq1$.
arXiv Detail & Related papers (2024-09-10T14:31:21Z) - Hamiltonian Mechanics of Feature Learning: Bottleneck Structure in Leaky ResNets [58.460298576330835]
We study Leaky ResNets, which interpolate between ResNets ($tildeLtoinfty$) and Fully-Connected nets ($tildeLtoinfty$)
In the infinite depth limit, we study'representation geodesics' $A_p$: continuous paths in representation space (similar to NeuralODEs)
We leverage this intuition to explain the emergence of a bottleneck structure, as observed in previous work.
arXiv Detail & Related papers (2024-05-27T18:15:05Z) - Sample Complexity of Neural Policy Mirror Descent for Policy
Optimization on Low-Dimensional Manifolds [75.51968172401394]
We study the sample complexity of the neural policy mirror descent (NPMD) algorithm with deep convolutional neural networks (CNN)
In each iteration of NPMD, both the value function and the policy can be well approximated by CNNs.
We show that NPMD can leverage the low-dimensional structure of state space to escape from the curse of dimensionality.
arXiv Detail & Related papers (2023-09-25T07:31:22Z) - Polynomial Width is Sufficient for Set Representation with
High-dimensional Features [69.65698500919869]
DeepSets is the most widely used neural network architecture for set representation.
We present two set-element embedding layers: (a) linear + power activation (LP) and (b) linear + exponential activations (LE)
arXiv Detail & Related papers (2023-07-08T16:00:59Z) - Extending the Design Space of Graph Neural Networks by Rethinking
Folklore Weisfeiler-Lehman [66.23316415757456]
Message passing neural networks (MPNNs) have emerged as the most popular framework of graph neural networks (GNNs) in recent years.
However, their expressive power is limited by the 1-dimensional Weisfeiler-Lehman (1-WL) test.
We propose an extension, $(k,t)$-FWL, which considers any equivariant set as neighbors instead of all nodes.
N$2$-GNN achieves record-breaking results on ZINC-Subset (0.059), outperforming previous SOTA results by 10.6%.
arXiv Detail & Related papers (2023-06-05T21:35:32Z) - Neural Networks Efficiently Learn Low-Dimensional Representations with
SGD [22.703825902761405]
We show that SGD-trained ReLU NNs can learn a single-index target of the form $y=f(langleboldsymbolu,boldsymbolxrangle) + epsilon$ by recovering the principal direction.
We also provide compress guarantees for NNs using the approximate low-rank structure produced by SGD.
arXiv Detail & Related papers (2022-09-29T15:29:10Z) - Understanding Deep Neural Function Approximation in Reinforcement
Learning via $\epsilon$-Greedy Exploration [53.90873926758026]
This paper provides a theoretical study of deep neural function approximation in reinforcement learning (RL)
We focus on the value based algorithm with the $epsilon$-greedy exploration via deep (and two-layer) neural networks endowed by Besov (and Barron) function spaces.
Our analysis reformulates the temporal difference error in an $L2(mathrmdmu)$-integrable space over a certain averaged measure $mu$, and transforms it to a generalization problem under the non-iid setting.
arXiv Detail & Related papers (2022-09-15T15:42:47Z) - Approximating smooth functions by deep neural networks with sigmoid
activation function [0.0]
We study the power of deep neural networks (DNNs) with sigmoid activation function.
We show that DNNs with fixed depth and a width of order $Md$ achieve an approximation rate of $M-2p$.
arXiv Detail & Related papers (2020-10-08T07:29:31Z) - A deep network construction that adapts to intrinsic dimensionality
beyond the domain [79.23797234241471]
We study the approximation of two-layer compositions $f(x) = g(phi(x))$ via deep networks with ReLU activation.
We focus on two intuitive and practically relevant choices for $phi$: the projection onto a low-dimensional embedded submanifold and a distance to a collection of low-dimensional sets.
arXiv Detail & Related papers (2020-08-06T09:50:29Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.