Related papers: A Feedforward Unitary Equivariant Neural Network

Related papers

Unveiling Induction Heads: Provable Training Dynamics and Feature Learning in Transformers [54.20763128054692]
We study how a two-attention-layer transformer is trained to perform ICL on $n$-gram Markov chain data. We prove that the gradient flow with respect to a cross-entropy ICL loss converges to a limiting model.
arXiv Detail & Related papers (2024-09-09T18:10:26Z)
Scale Equivariant Graph Metanetworks [20.445135424921908]
This paper pertains to an emerging machine learning paradigm: learning functions whose inputs are functions themselves. We propose $textitScale Equivariant Graph MetaNetworks - ScaleGMNs$, a framework that adapts the Graph Metanetwork (message-passing) paradigm by incorporating scaling symmetries.
arXiv Detail & Related papers (2024-06-15T16:41:04Z)
How do Transformers perform In-Context Autoregressive Learning? [76.18489638049545]
We train a Transformer model on a simple next token prediction task. We show how a trained Transformer predicts the next token by first learning $W$ in-context, then applying a prediction mapping.
arXiv Detail & Related papers (2024-02-08T16:24:44Z)
On dimensionality of feature vectors in MPNNs [49.32130498861987]
We revisit the classical result of Morris et al.(AAAI'19) that message-passing graphs neural networks (MPNNs) are equal in their distinguishing power to the Weisfeiler--Leman (WL) isomorphism test.
arXiv Detail & Related papers (2024-02-06T12:56:55Z)
Convolutions and More as Einsum: A Tensor Network Perspective with Advances for Second-Order Methods [2.8645507575980074]
We simplify convolutions by viewing them as tensor networks (TNs) TNs allow reasoning about the underlying tensor multiplications by drawing diagrams, manipulating them to perform function transformations like differentiation, and efficiently evaluating them with einsum. Our TN implementation accelerates KFAC variant up to 4.5x while removing the standard implementation's memory overhead, and enables new hardware-efficient dropouts for approximate backpropagation.
arXiv Detail & Related papers (2023-07-05T13:19:41Z)
Invariant Layers for Graphs with Nodes of Different Types [27.530546740444077]
We show that implementing linear layers invariant to input permutations allows learning important node interactions more effectively than existing techniques. Our findings suggest that function approximation on a graph with $n$ nodes can be done with tensors of sizes $leq n$, which is tighter than the best-known bound $leq n(n-1)/2$.
arXiv Detail & Related papers (2023-02-27T07:10:33Z)
Implicit Convolutional Kernels for Steerable CNNs [5.141137421503899]
Steerable convolutional neural networks (CNNs) provide a general framework for building neural networks equivariant to translations and transformations of an origin-preserving group $G$. We propose using implicit neural representation via multi-layer perceptrons (MLPs) to parameterize $G$-steerable kernels. We prove the effectiveness of our method on multiple tasks, including N-body simulations, point cloud classification and molecular property prediction.
arXiv Detail & Related papers (2022-12-12T18:10:33Z)
Equivalence Between SE(3) Equivariant Networks via Steerable Kernels and Group Convolution [90.67482899242093]
A wide range of techniques have been proposed in recent years for designing neural networks for 3D data that are equivariant under rotation and translation of the input. We provide an in-depth analysis of both methods and their equivalence and relate the two constructions to multiview convolutional networks. We also derive new TFN non-linearities from our equivalence principle and test them on practical benchmark datasets.
arXiv Detail & Related papers (2022-11-29T03:42:11Z)
Supervised Contrastive Prototype Learning: Augmentation Free Robust Neural Network [17.10753224600936]
Transformations in the input space of Deep Neural Networks (DNN) lead to unintended changes in the feature space. We propose a training framework, $textbfd Contrastive Prototype Learning$ ( SCPL) We use N-pair contrastive loss with prototypes of the same and opposite classes and replace a categorical classification head with a $textbfPrototype Classification Head$ (PCH) Our approach is $textitsample efficient$, does not require $textitsample mining$, can be implemented on any existing DNN without modification to their
arXiv Detail & Related papers (2022-11-26T01:17:15Z)
Revisiting Transformation Invariant Geometric Deep Learning: Are Initial Representations All You Need? [80.86819657126041]
We show that transformation-invariant and distance-preserving initial representations are sufficient to achieve transformation invariance. Specifically, we realize transformation-invariant and distance-preserving initial point representations by modifying multi-dimensional scaling. We prove that TinvNN can strictly guarantee transformation invariance, being general and flexible enough to be combined with the existing neural networks.
arXiv Detail & Related papers (2021-12-23T03:52:33Z)
Beyond Lazy Training for Over-parameterized Tensor Decomposition [69.4699995828506]
We show that gradient descent on over-parametrized objective could go beyond the lazy training regime and utilize certain low-rank structure in the data. Our results show that gradient descent on over-parametrized objective could go beyond the lazy training regime and utilize certain low-rank structure in the data.
arXiv Detail & Related papers (2020-10-22T00:32:12Z)
Why Are Convolutional Nets More Sample-Efficient than Fully-Connected Nets? [33.51250867983687]
We show a natural task on which a provable sample complexity gap can be shown, for standard training algorithms. We demonstrate a single target function, learning which on all possible distributions leads to an $O(1)$ vs $Omega(d2/varepsilon)$ gap. Similar results are achieved for $ell$ regression and adaptive training algorithms, e.g. Adam and AdaGrad.
arXiv Detail & Related papers (2020-10-16T17:15:39Z)

This list is automatically generated from the titles and abstracts of the papers in this site.