EquiformerV2: Improved Equivariant Transformer for Scaling to
Higher-Degree Representations
- URL: http://arxiv.org/abs/2306.12059v3
- Date: Wed, 6 Mar 2024 21:45:16 GMT
- Title: EquiformerV2: Improved Equivariant Transformer for Scaling to
Higher-Degree Representations
- Authors: Yi-Lun Liao, Brandon Wood, Abhishek Das, Tess Smidt
- Abstract summary: We propose EquiformerV2, which outperforms previous state-of-the-art methods on large-scale OC20 dataset by up to $9%$ on forces.
We also compare EquiformerV2 with Equiformer on QM9 and OC20 S2EF-2M datasets to better understand the performance gain brought by higher degrees.
- Score: 9.718771797861908
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Equivariant Transformers such as Equiformer have demonstrated the efficacy of
applying Transformers to the domain of 3D atomistic systems. However, they are
limited to small degrees of equivariant representations due to their
computational complexity. In this paper, we investigate whether these
architectures can scale well to higher degrees. Starting from Equiformer, we
first replace $SO(3)$ convolutions with eSCN convolutions to efficiently
incorporate higher-degree tensors. Then, to better leverage the power of higher
degrees, we propose three architectural improvements -- attention
re-normalization, separable $S^2$ activation and separable layer normalization.
Putting this all together, we propose EquiformerV2, which outperforms previous
state-of-the-art methods on large-scale OC20 dataset by up to $9\%$ on forces,
$4\%$ on energies, offers better speed-accuracy trade-offs, and $2\times$
reduction in DFT calculations needed for computing adsorption energies.
Additionally, EquiformerV2 trained on only OC22 dataset outperforms GemNet-OC
trained on both OC20 and OC22 datasets, achieving much better data efficiency.
Finally, we compare EquiformerV2 with Equiformer on QM9 and OC20 S2EF-2M
datasets to better understand the performance gain brought by higher degrees.
Related papers
- Kolmogorov-Arnold Transformer [72.88137795439407]
We introduce the Kolmogorov-Arnold Transformer (KAT), a novel architecture that replaces layers with Kolmogorov-Arnold Network (KAN) layers.
We identify three key challenges: (C1) Base function, (C2) Inefficiency, and (C3) Weight.
With these designs, KAT outperforms traditional-based transformers.
arXiv Detail & Related papers (2024-09-16T17:54:51Z) - MotionAGFormer: Enhancing 3D Human Pose Estimation with a
Transformer-GCNFormer Network [2.7268855969580166]
We present a novel Attention-GCNFormer block that divides the number of channels by using two parallel transformer and GCNFormer streams.
Our proposed GCNFormer module exploits the local relationship between adjacent joints, outputting a new representation that is complementary to the transformer output.
We evaluate our model on two popular benchmark datasets: Human3.6M and MPI-INF-3DHP.
arXiv Detail & Related papers (2023-10-25T01:46:35Z) - Isomer: Isomerous Transformer for Zero-shot Video Object Segmentation [59.91357714415056]
We propose two Transformer variants: Context-Sharing Transformer (CST) and Semantic Gathering-Scattering Transformer (S GST)
CST learns the global-shared contextual information within image frames with a lightweight computation; S GST models the semantic correlation separately for the foreground and background.
Compared with the baseline that uses vanilla Transformers for multi-stage fusion, ours significantly increase the speed by 13 times and achieves new state-of-the-art ZVOS performance.
arXiv Detail & Related papers (2023-08-13T06:12:00Z) - CageViT: Convolutional Activation Guided Efficient Vision Transformer [90.69578999760206]
This paper presents an efficient vision Transformer, called CageViT, that is guided by convolutional activation to reduce computation.
Our CageViT, unlike current Transformers, utilizes a new encoder to handle the rearranged tokens.
Experimental results demonstrate that the proposed CageViT outperforms the most recent state-of-the-art backbones by a large margin in terms of efficiency.
arXiv Detail & Related papers (2023-05-17T03:19:18Z) - AccelTran: A Sparsity-Aware Accelerator for Dynamic Inference with
Transformers [6.0093441900032465]
Self-attention-based transformer models have achieved tremendous success in the domain of natural language processing.
Previous works directly operate on large matrices involved in the attention operation, which limits hardware utilization.
We propose a novel dynamic inference scheme, DynaTran, which prunes activations at runtime with low overhead.
arXiv Detail & Related papers (2023-02-28T16:17:23Z) - Reducing SO(3) Convolutions to SO(2) for Efficient Equivariant GNNs [3.1618838742094457]
equivariant convolutions increase significantly in computational complexity as higher-order tensors are used.
We propose a graph neural network utilizing our novel approach to equivariant convolutions, which achieves state-of-the-art results on the large-scale OC-20 and OC-22 datasets.
arXiv Detail & Related papers (2023-02-07T18:16:13Z) - Equiformer: Equivariant Graph Attention Transformer for 3D Atomistic
Graphs [3.0603554929274908]
3D-related inductive biases are indispensable to graph neural networks operating on 3D atomistic graphs such as molecules.
Inspired by the success of Transformers in various domains, we study how to incorporate these inductive biases into Transformers.
We present Equiformer, a graph neural network leveraging the strength of Transformer architectures.
arXiv Detail & Related papers (2022-06-23T21:40:37Z) - Residual Mixture of Experts [75.5489156421442]
Residual Mixture of Experts (RMoE) is an efficient training pipeline for MoE vision transformers on downstream tasks.
RMoE achieves comparable results with the upper-bound MoE training, while only introducing minor additional training cost.
arXiv Detail & Related papers (2022-04-20T17:29:48Z) - Ensemble Transformer for Efficient and Accurate Ranking Tasks: an
Application to Question Answering Systems [99.13795374152997]
We propose a neural network designed to distill an ensemble of large transformers into a single smaller model.
An MHS model consists of two components: a stack of transformer layers that is used to encode inputs, and a set of ranking heads.
Unlike traditional distillation techniques, our approach leverages individual models in ensemble as teachers in a way that preserves the diversity of the ensemble members.
arXiv Detail & Related papers (2022-01-15T06:21:01Z) - Energon: Towards Efficient Acceleration of Transformers Using Dynamic
Sparse Attention [5.495006023171481]
transformer models have revolutionized Natural Language Processing (NLP) and also show promising performance on Computer Vision (CV) tasks.
We propose Energon, an algorithm-architecture co-design approach that accelerates various transformers using dynamic sparse attention.
We demonstrate that Energon achieves $161times$ and $8.4times$ geo-mean speedup and up to $104times$ and $103times$ energy reduction.
arXiv Detail & Related papers (2021-10-18T13:42:43Z) - SE(3)-Transformers: 3D Roto-Translation Equivariant Attention Networks [71.55002934935473]
We introduce the SE(3)-Transformer, a variant of the self-attention module for 3D point clouds and graphs, which is equivariant under continuous 3D roto-translations.
We evaluate our model on a toy N-body particle simulation dataset, showcasing the robustness of the predictions under rotations of the input.
arXiv Detail & Related papers (2020-06-18T13:23:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.