Polynomial Width is Sufficient for Set Representation with
High-dimensional Features
- URL: http://arxiv.org/abs/2307.04001v3
- Date: Thu, 7 Mar 2024 05:56:19 GMT
- Title: Polynomial Width is Sufficient for Set Representation with
High-dimensional Features
- Authors: Peihao Wang, Shenghao Yang, Shu Li, Zhangyang Wang, Pan Li
- Abstract summary: DeepSets is the most widely used neural network architecture for set representation.
We present two set-element embedding layers: (a) linear + power activation (LP) and (b) linear + exponential activations (LE)
- Score: 69.65698500919869
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Set representation has become ubiquitous in deep learning for modeling the
inductive bias of neural networks that are insensitive to the input order.
DeepSets is the most widely used neural network architecture for set
representation. It involves embedding each set element into a latent space with
dimension $L$, followed by a sum pooling to obtain a whole-set embedding, and
finally mapping the whole-set embedding to the output. In this work, we
investigate the impact of the dimension $L$ on the expressive power of
DeepSets. Previous analyses either oversimplified high-dimensional features to
be one-dimensional features or were limited to analytic activations, thereby
diverging from practical use or resulting in $L$ that grows exponentially with
the set size $N$ and feature dimension $D$. To investigate the minimal value of
$L$ that achieves sufficient expressive power, we present two set-element
embedding layers: (a) linear + power activation (LP) and (b) linear +
exponential activations (LE). We demonstrate that $L$ being poly$(N, D)$ is
sufficient for set representation using both embedding layers. We also provide
a lower bound of $L$ for the LP embedding layer. Furthermore, we extend our
results to permutation-equivariant set functions and the complex field.
Related papers
- Hamiltonian Mechanics of Feature Learning: Bottleneck Structure in Leaky ResNets [58.460298576330835]
We study Leaky ResNets, which interpolate between ResNets ($tildeLtoinfty$) and Fully-Connected nets ($tildeLtoinfty$)
In the infinite depth limit, we study'representation geodesics' $A_p$: continuous paths in representation space (similar to NeuralODEs)
We leverage this intuition to explain the emergence of a bottleneck structure, as observed in previous work.
arXiv Detail & Related papers (2024-05-27T18:15:05Z) - Fourier Sliced-Wasserstein Embedding for Multisets and Measures [3.396731589928944]
We present a novel method to embed multisets and measures over $mathbbRd$ into Euclidean space.
We show that our method yields superior representations of input multisets and offers practical advantage for learning on multiset data.
arXiv Detail & Related papers (2024-05-26T11:04:41Z) - Universal Representation of Permutation-Invariant Functions on Vectors
and Tensors [11.345796608258434]
A main object of our study is multiset functions -- that is, permutation-invariant functions over inputs of varying sizes.
Deep Sets, proposed by citezaheer 2017deep, provides a emphuniversal representation for continuous multiset functions on scalars via a sum-decomposable model.
We prove that universal representation is guaranteed for continuous and discontinuous multiset functions though a latent space dimension of $O(ND)$.
arXiv Detail & Related papers (2023-10-20T22:00:59Z) - Capacity Bounds for Hyperbolic Neural Network Representations of Latent
Tree Structures [8.28720658988688]
We study the representation capacity of deep hyperbolic neural networks (HNNs) with a ReLU activation function.
We establish the first proof that HNNs can embed any finite weighted tree into a hyperbolic space of dimensiond$ at least equal to $2$.
We find that the network complexity of HNN implementing the graph representation is independent of the representation fidelity/distortion.
arXiv Detail & Related papers (2023-08-18T02:24:32Z) - Bottleneck Structure in Learned Features: Low-Dimension vs Regularity Tradeoff [12.351756386062291]
We formalize a balance between learning low-dimensional representations and minimizing complexity/irregularity in the feature maps.
For large depths, almost all hidden representations are approximately $R(0)(f)$-dimensional, and almost all weight matrices $W_ell$ have $R(0)(f)$ singular values close to 1.
Interestingly, the use of large learning rates is required to guarantee an order $O(L)$ NTK which in turns guarantees infinite depth convergence of the representations of almost all layers.
arXiv Detail & Related papers (2023-05-30T13:06:26Z) - Exponential Separations in Symmetric Neural Networks [48.80300074254758]
We consider symmetric Networkparencitesantoro 2017simple architecture as a natural generalization of DeepSetsparencitezaheerdeep architecture.
Under the restriction to analytic activation functions, we construct a symmetric function acting on sets of dimensions $N$ in dimension with $D$.
arXiv Detail & Related papers (2022-06-02T19:45:10Z) - The Power of Subsampling in Submodular Maximization [51.629656762796564]
We show that this approach leads to optimal/state-of-the-art results despite being much simpler than existing methods.
We empirically demonstrate the effectiveness of our algorithms on video summarization, location summarization, and movie recommendation tasks.
arXiv Detail & Related papers (2021-04-06T20:25:57Z) - Size and Depth Separation in Approximating Natural Functions with Neural
Networks [52.73592689730044]
We show the benefits of size and depth for approximation of natural functions with ReLU networks.
We show a complexity-theoretic barrier to proving such results beyond size $O(d)$.
We also show an explicit natural function, that can be approximated with networks of size $O(d)$.
arXiv Detail & Related papers (2021-01-30T21:30:11Z) - A deep network construction that adapts to intrinsic dimensionality
beyond the domain [79.23797234241471]
We study the approximation of two-layer compositions $f(x) = g(phi(x))$ via deep networks with ReLU activation.
We focus on two intuitive and practically relevant choices for $phi$: the projection onto a low-dimensional embedded submanifold and a distance to a collection of low-dimensional sets.
arXiv Detail & Related papers (2020-08-06T09:50:29Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.