A Feedforward Unitary Equivariant Neural Network
- URL: http://arxiv.org/abs/2208.12146v1
- Date: Thu, 25 Aug 2022 15:05:02 GMT
- Title: A Feedforward Unitary Equivariant Neural Network
- Authors: Pui-Wai Ma and T.-H. Hubert Chan
- Abstract summary: We devise a new type of feedforward neural network.
It is equivariant with respect to the unitary group $U(n)$.
The input and output can be vectors in $mathbbCn$ with arbitrary dimension $n$.
- Score: 3.6220250022337335
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We devise a new type of feedforward neural network. It is equivariant with
respect to the unitary group $U(n)$. The input and output can be vectors in
$\mathbb{C}^n$ with arbitrary dimension $n$. No convolution layer is required
in our implementation. We avoid errors due to truncated higher order terms in
Fourier-like transformation. The implementation of each layer can be done
efficiently using simple calculations. As a proof of concept, we have given
empirical results on the prediction of the dynamics of atomic motion to
demonstrate the practicality of our approach.
Related papers
- Unveiling Induction Heads: Provable Training Dynamics and Feature Learning in Transformers [54.20763128054692]
We study how a two-attention-layer transformer is trained to perform ICL on $n$-gram Markov chain data.
We prove that the gradient flow with respect to a cross-entropy ICL loss converges to a limiting model.
arXiv Detail & Related papers (2024-09-09T18:10:26Z) - Scale Equivariant Graph Metanetworks [20.445135424921908]
This paper pertains to an emerging machine learning paradigm: learning functions whose inputs are functions themselves.
We propose $textitScale Equivariant Graph MetaNetworks - ScaleGMNs$, a framework that adapts the Graph Metanetwork (message-passing) paradigm by incorporating scaling symmetries.
arXiv Detail & Related papers (2024-06-15T16:41:04Z) - How do Transformers perform In-Context Autoregressive Learning? [76.18489638049545]
We train a Transformer model on a simple next token prediction task.
We show how a trained Transformer predicts the next token by first learning $W$ in-context, then applying a prediction mapping.
arXiv Detail & Related papers (2024-02-08T16:24:44Z) - On dimensionality of feature vectors in MPNNs [49.32130498861987]
We revisit the classical result of Morris et al.(AAAI'19) that message-passing graphs neural networks (MPNNs) are equal in their distinguishing power to the Weisfeiler--Leman (WL) isomorphism test.
arXiv Detail & Related papers (2024-02-06T12:56:55Z) - Convolutions and More as Einsum: A Tensor Network Perspective with Advances for Second-Order Methods [2.8645507575980074]
We simplify convolutions by viewing them as tensor networks (TNs)
TNs allow reasoning about the underlying tensor multiplications by drawing diagrams, manipulating them to perform function transformations like differentiation, and efficiently evaluating them with einsum.
Our TN implementation accelerates KFAC variant up to 4.5x while removing the standard implementation's memory overhead, and enables new hardware-efficient dropouts for approximate backpropagation.
arXiv Detail & Related papers (2023-07-05T13:19:41Z) - Invariant Layers for Graphs with Nodes of Different Types [27.530546740444077]
We show that implementing linear layers invariant to input permutations allows learning important node interactions more effectively than existing techniques.
Our findings suggest that function approximation on a graph with $n$ nodes can be done with tensors of sizes $leq n$, which is tighter than the best-known bound $leq n(n-1)/2$.
arXiv Detail & Related papers (2023-02-27T07:10:33Z) - Implicit Convolutional Kernels for Steerable CNNs [5.141137421503899]
Steerable convolutional neural networks (CNNs) provide a general framework for building neural networks equivariant to translations and transformations of an origin-preserving group $G$.
We propose using implicit neural representation via multi-layer perceptrons (MLPs) to parameterize $G$-steerable kernels.
We prove the effectiveness of our method on multiple tasks, including N-body simulations, point cloud classification and molecular property prediction.
arXiv Detail & Related papers (2022-12-12T18:10:33Z) - Equivalence Between SE(3) Equivariant Networks via Steerable Kernels and
Group Convolution [90.67482899242093]
A wide range of techniques have been proposed in recent years for designing neural networks for 3D data that are equivariant under rotation and translation of the input.
We provide an in-depth analysis of both methods and their equivalence and relate the two constructions to multiview convolutional networks.
We also derive new TFN non-linearities from our equivalence principle and test them on practical benchmark datasets.
arXiv Detail & Related papers (2022-11-29T03:42:11Z) - Revisiting Transformation Invariant Geometric Deep Learning: Are Initial
Representations All You Need? [80.86819657126041]
We show that transformation-invariant and distance-preserving initial representations are sufficient to achieve transformation invariance.
Specifically, we realize transformation-invariant and distance-preserving initial point representations by modifying multi-dimensional scaling.
We prove that TinvNN can strictly guarantee transformation invariance, being general and flexible enough to be combined with the existing neural networks.
arXiv Detail & Related papers (2021-12-23T03:52:33Z) - Beyond Lazy Training for Over-parameterized Tensor Decomposition [69.4699995828506]
We show that gradient descent on over-parametrized objective could go beyond the lazy training regime and utilize certain low-rank structure in the data.
Our results show that gradient descent on over-parametrized objective could go beyond the lazy training regime and utilize certain low-rank structure in the data.
arXiv Detail & Related papers (2020-10-22T00:32:12Z) - Why Are Convolutional Nets More Sample-Efficient than Fully-Connected
Nets? [33.51250867983687]
We show a natural task on which a provable sample complexity gap can be shown, for standard training algorithms.
We demonstrate a single target function, learning which on all possible distributions leads to an $O(1)$ vs $Omega(d2/varepsilon)$ gap.
Similar results are achieved for $ell$ regression and adaptive training algorithms, e.g. Adam and AdaGrad.
arXiv Detail & Related papers (2020-10-16T17:15:39Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.