SE(3)-Hyena Operator for Scalable Equivariant Learning
- URL: http://arxiv.org/abs/2407.01049v2
- Date: Tue, 13 Aug 2024 15:06:41 GMT
- Title: SE(3)-Hyena Operator for Scalable Equivariant Learning
- Authors: Artem Moskalev, Mangal Prakash, Rui Liao, Tommaso Mansi,
- Abstract summary: We introduce SE(3)-Hyena, an equivariant long-convolutional model based on the Hyena operator.
Our model processes the geometric context of 20k tokens x3.5 times faster than the equivariant transformer.
- Score: 5.354533854744212
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Modeling global geometric context while maintaining equivariance is crucial for accurate predictions in many fields such as biology, chemistry, or vision. Yet, this is challenging due to the computational demands of processing high-dimensional data at scale. Existing approaches such as equivariant self-attention or distance-based message passing, suffer from quadratic complexity with respect to sequence length, while localized methods sacrifice global information. Inspired by the recent success of state-space and long-convolutional models, in this work, we introduce SE(3)-Hyena operator, an equivariant long-convolutional model based on the Hyena operator. The SE(3)-Hyena captures global geometric context at sub-quadratic complexity while maintaining equivariance to rotations and translations. Evaluated on equivariant associative recall and n-body modeling, SE(3)-Hyena matches or outperforms equivariant self-attention while requiring significantly less memory and computational resources for long sequences. Our model processes the geometric context of 20k tokens x3.5 times faster than the equivariant transformer and allows x175 longer a context within the same memory budget.
Related papers
- Does equivariance matter at scale? [15.247352029530523]
We study how equivariant and non-equivariant networks scale with compute and training samples.
First, equivariance improves data efficiency, but training non-equivariant models with data augmentation can close this gap given sufficient epochs.
Second, scaling with compute follows a power law, with equivariant models outperforming non-equivariant ones at each tested compute budget.
arXiv Detail & Related papers (2024-10-30T16:36:59Z) - Relaxed Equivariance via Multitask Learning [7.905957228045955]
We introduce REMUL, a training procedure for approximating equivariance with multitask learning.
We show that unconstrained models can learn approximate symmetries by minimizing an additional simple equivariance loss.
Our method achieves competitive performance compared to equivariant baselines while being $10 times$ faster at inference and $2.5 times$ at training.
arXiv Detail & Related papers (2024-10-23T13:50:27Z) - Approximately Equivariant Neural Processes [47.14384085714576]
We consider the use of approximately equivariant architectures in neural processes.
We demonstrate the effectiveness of our approach on a number of synthetic and real-world regression experiments.
arXiv Detail & Related papers (2024-06-19T12:17:14Z) - LongVQ: Long Sequence Modeling with Vector Quantization on Structured Memory [63.41820940103348]
Self-attention mechanism's computational cost limits its practicality for long sequences.
We propose a new method called LongVQ to compress the global abstraction as a length-fixed codebook.
LongVQ effectively maintains dynamic global and local patterns, which helps to complement the lack of long-range dependency issues.
arXiv Detail & Related papers (2024-04-17T08:26:34Z) - Hyena Hierarchy: Towards Larger Convolutional Language Models [115.82857881546089]
Hyena is a subquadratic drop-in replacement for attention constructed by interleaving implicitly parametrized long convolutions and data-controlled gating.
In recall and reasoning tasks on sequences of thousands to hundreds of thousands of tokens, Hyena improves accuracy by more than 50 points over operators relying on state-spaces and other implicit and explicit methods.
arXiv Detail & Related papers (2023-02-21T18:29:25Z) - The Lie Derivative for Measuring Learned Equivariance [84.29366874540217]
We study the equivariance properties of hundreds of pretrained models, spanning CNNs, transformers, and Mixer architectures.
We find that many violations of equivariance can be linked to spatial aliasing in ubiquitous network layers, such as pointwise non-linearities.
For example, transformers can be more equivariant than convolutional neural networks after training.
arXiv Detail & Related papers (2022-10-06T15:20:55Z) - Design equivariant neural networks for 3D point cloud [0.0]
This work seeks to improve the generalization and robustness of existing neural networks for 3D point clouds.
The main challenge when designing equivariant models for point clouds is how to trade-off the performance of the model and the complexity.
The proposed procedure is general and forms a fundamental approach to group equivariant neural networks.
arXiv Detail & Related papers (2022-05-02T02:57:13Z) - Equivariant vector field network for many-body system modeling [65.22203086172019]
Equivariant Vector Field Network (EVFN) is built on a novel equivariant basis and the associated scalarization and vectorization layers.
We evaluate our method on predicting trajectories of simulated Newton mechanics systems with both full and partially observed data.
arXiv Detail & Related papers (2021-10-26T14:26:25Z) - Frame Averaging for Invariant and Equivariant Network Design [50.87023773850824]
We introduce Frame Averaging (FA), a framework for adapting known (backbone) architectures to become invariant or equivariant to new symmetry types.
We show that FA-based models have maximal expressive power in a broad setting.
We propose a new class of universal Graph Neural Networks (GNNs), universal Euclidean motion invariant point cloud networks, and Euclidean motion invariant Message Passing (MP) GNNs.
arXiv Detail & Related papers (2021-10-07T11:05:23Z) - The Effects of Invertibility on the Representational Complexity of
Encoders in Variational Autoencoders [16.27499951949733]
We show that if the generative map is "strongly invertible" (in a sense we suitably formalize), the inferential model need not be much more complex.
Importantly, we do not require the generative model to be layerwise invertible.
We provide theoretical support for the empirical wisdom that learning deep generative models is harder when data lies on a low-dimensional manifold.
arXiv Detail & Related papers (2021-07-09T19:53:29Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.