Related papers: Polynomial Width is Sufficient for Set Representation with High-dimensional Features

Polynomial Width is Sufficient for Set Representation with High-dimensional Features

URL: http://arxiv.org/abs/2307.04001v3
Date: Thu, 7 Mar 2024 05:56:19 GMT
Title: Polynomial Width is Sufficient for Set Representation with High-dimensional Features
Authors: Peihao Wang, Shenghao Yang, Shu Li, Zhangyang Wang, Pan Li
Abstract summary: DeepSets is the most widely used neural network architecture for set representation. We present two set-element embedding layers: (a) linear + power activation (LP) and (b) linear + exponential activations (LE)
Score: 69.65698500919869
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Set representation has become ubiquitous in deep learning for modeling the inductive bias of neural networks that are insensitive to the input order. DeepSets is the most widely used neural network architecture for set representation. It involves embedding each set element into a latent space with dimension $L$, followed by a sum pooling to obtain a whole-set embedding, and finally mapping the whole-set embedding to the output. In this work, we investigate the impact of the dimension $L$ on the expressive power of DeepSets. Previous analyses either oversimplified high-dimensional features to be one-dimensional features or were limited to analytic activations, thereby diverging from practical use or resulting in $L$ that grows exponentially with the set size $N$ and feature dimension $D$. To investigate the minimal value of $L$ that achieves sufficient expressive power, we present two set-element embedding layers: (a) linear + power activation (LP) and (b) linear + exponential activations (LE). We demonstrate that $L$ being poly$(N, D)$ is sufficient for set representation using both embedding layers. We also provide a lower bound of $L$ for the LP embedding layer. Furthermore, we extend our results to permutation-equivariant set functions and the complex field.

Related papers

The Generative Leap: Sharp Sample Complexity for Efficiently Learning Gaussian Multi-Index Models [71.5283441529015]
In this work we consider generic Gaussian Multi-index models, in which the labels only depend on the (Gaussian) $d$-dimensional inputs through their projection onto a low-dimensional $r = O_d(1)$ subspace.<n>We introduce the generative leap exponent $kstar$, a natural extension of the generative exponent from [Damian et al.'24] to the multi-index setting.
arXiv Detail & Related papers (2025-06-05T18:34:56Z)
New advances in universal approximation with neural networks of minimal width [4.424170214926035]
We show that autoencoders with leaky ReLU activations are universal approximators of $Lp$ functions. We broaden our results to show that smooth invertible neural networks can approximate $Lp(mathbbRd,mathbbRd)$ on compacta.
arXiv Detail & Related papers (2024-11-13T16:17:16Z)
Hamiltonian Mechanics of Feature Learning: Bottleneck Structure in Leaky ResNets [58.460298576330835]
We study Leaky ResNets, which interpolate between ResNets ($tildeLtoinfty$) and Fully-Connected nets ($tildeLtoinfty$) In the infinite depth limit, we study'representation geodesics' $A_p$: continuous paths in representation space (similar to NeuralODEs) We leverage this intuition to explain the emergence of a bottleneck structure, as observed in previous work.
arXiv Detail & Related papers (2024-05-27T18:15:05Z)
Universal Representation of Permutation-Invariant Functions on Vectors and Tensors [11.345796608258434]
A main object of our study is multiset functions -- that is, permutation-invariant functions over inputs of varying sizes. Deep Sets, proposed by citezaheer 2017deep, provides a emphuniversal representation for continuous multiset functions on scalars via a sum-decomposable model. We prove that universal representation is guaranteed for continuous and discontinuous multiset functions though a latent space dimension of $O(ND)$.
arXiv Detail & Related papers (2023-10-20T22:00:59Z)
Capacity Bounds for Hyperbolic Neural Network Representations of Latent Tree Structures [8.28720658988688]
We study the representation capacity of deep hyperbolic neural networks (HNNs) with a ReLU activation function. We establish the first proof that HNNs can embed any finite weighted tree into a hyperbolic space of dimensiond$ at least equal to $2$. We find that the network complexity of HNN implementing the graph representation is independent of the representation fidelity/distortion.
arXiv Detail & Related papers (2023-08-18T02:24:32Z)
Bottleneck Structure in Learned Features: Low-Dimension vs Regularity Tradeoff [12.351756386062291]
We formalize a balance between learning low-dimensional representations and minimizing complexity/irregularity in the feature maps. For large depths, almost all hidden representations are approximately $R(0)(f)$-dimensional, and almost all weight matrices $W_ell$ have $R(0)(f)$ singular values close to 1. Interestingly, the use of large learning rates is required to guarantee an order $O(L)$ NTK which in turns guarantees infinite depth convergence of the representations of almost all layers.
arXiv Detail & Related papers (2023-05-30T13:06:26Z)
Exponential Separations in Symmetric Neural Networks [48.80300074254758]
We consider symmetric Networkparencitesantoro 2017simple architecture as a natural generalization of DeepSetsparencitezaheerdeep architecture. Under the restriction to analytic activation functions, we construct a symmetric function acting on sets of dimensions $N$ in dimension with $D$.
arXiv Detail & Related papers (2022-06-02T19:45:10Z)
The Power of Subsampling in Submodular Maximization [51.629656762796564]
We show that this approach leads to optimal/state-of-the-art results despite being much simpler than existing methods. We empirically demonstrate the effectiveness of our algorithms on video summarization, location summarization, and movie recommendation tasks.
arXiv Detail & Related papers (2021-04-06T20:25:57Z)
Size and Depth Separation in Approximating Natural Functions with Neural Networks [52.73592689730044]
We show the benefits of size and depth for approximation of natural functions with ReLU networks. We show a complexity-theoretic barrier to proving such results beyond size $O(d)$. We also show an explicit natural function, that can be approximated with networks of size $O(d)$.
arXiv Detail & Related papers (2021-01-30T21:30:11Z)
A deep network construction that adapts to intrinsic dimensionality beyond the domain [79.23797234241471]
We study the approximation of two-layer compositions $f(x) = g(phi(x))$ via deep networks with ReLU activation. We focus on two intuitive and practically relevant choices for $phi$: the projection onto a low-dimensional embedded submanifold and a distance to a collection of low-dimensional sets.
arXiv Detail & Related papers (2020-08-06T09:50:29Z)

This list is automatically generated from the titles and abstracts of the papers in this site.