Related papers: EquiformerV2: Improved Equivariant Transformer for Scaling to Higher-Degree Representations

EquiformerV2: Improved Equivariant Transformer for Scaling to Higher-Degree Representations

URL: http://arxiv.org/abs/2306.12059v3
Date: Wed, 6 Mar 2024 21:45:16 GMT
Title: EquiformerV2: Improved Equivariant Transformer for Scaling to Higher-Degree Representations
Authors: Yi-Lun Liao, Brandon Wood, Abhishek Das, Tess Smidt
Abstract summary: We propose EquiformerV2, which outperforms previous state-of-the-art methods on large-scale OC20 dataset by up to $9%$ on forces. We also compare EquiformerV2 with Equiformer on QM9 and OC20 S2EF-2M datasets to better understand the performance gain brought by higher degrees.
Score: 9.718771797861908
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Equivariant Transformers such as Equiformer have demonstrated the efficacy of applying Transformers to the domain of 3D atomistic systems. However, they are limited to small degrees of equivariant representations due to their computational complexity. In this paper, we investigate whether these architectures can scale well to higher degrees. Starting from Equiformer, we first replace $SO(3)$ convolutions with eSCN convolutions to efficiently incorporate higher-degree tensors. Then, to better leverage the power of higher degrees, we propose three architectural improvements -- attention re-normalization, separable $S^2$ activation and separable layer normalization. Putting this all together, we propose EquiformerV2, which outperforms previous state-of-the-art methods on large-scale OC20 dataset by up to $9\%$ on forces, $4\%$ on energies, offers better speed-accuracy trade-offs, and $2\times$ reduction in DFT calculations needed for computing adsorption energies. Additionally, EquiformerV2 trained on only OC22 dataset outperforms GemNet-OC trained on both OC20 and OC22 datasets, achieving much better data efficiency. Finally, we compare EquiformerV2 with Equiformer on QM9 and OC20 S2EF-2M datasets to better understand the performance gain brought by higher degrees.

Related papers

Transformers with Joint Tokens and Local-Global Attention for Efficient Human Pose Estimation [34.99437411281915]
This paper proposes two ViT-based models for accurate, efficient, and robust 2D pose estimation. Experiments on six benchmarks demonstrate that the proposed methods significantly outperform state-of-the-art methods.
arXiv Detail & Related papers (2025-02-28T22:34:22Z)
E2Former: A Linear-time Efficient and Equivariant Transformer for Scalable Molecular Modeling [44.75336958712181]
We introduce E2Former, an equivariant and efficient transformer architecture that incorporates the Wigner $6j$ convolution (Wigner $6j$ Conv) By shifting the computational burden from edges to nodes, the Wigner $6j$ Conv reduces the complexity from $O(|mathcalE|)$ to $ O(| mathcalV|)$ while preserving both the model's expressive power and rotational equivariance. This development could suggest a promising direction for scalable and efficient molecular modeling.
arXiv Detail & Related papers (2025-01-31T15:22:58Z)
Kolmogorov-Arnold Transformer [72.88137795439407]
We introduce the Kolmogorov-Arnold Transformer (KAT), a novel architecture that replaces layers with Kolmogorov-Arnold Network (KAN) layers. We identify three key challenges: (C1) Base function, (C2) Inefficiency, and (C3) Weight. With these designs, KAT outperforms traditional-based transformers.
arXiv Detail & Related papers (2024-09-16T17:54:51Z)
MotionAGFormer: Enhancing 3D Human Pose Estimation with a Transformer-GCNFormer Network [2.7268855969580166]
We present a novel Attention-GCNFormer block that divides the number of channels by using two parallel transformer and GCNFormer streams. Our proposed GCNFormer module exploits the local relationship between adjacent joints, outputting a new representation that is complementary to the transformer output. We evaluate our model on two popular benchmark datasets: Human3.6M and MPI-INF-3DHP.
arXiv Detail & Related papers (2023-10-25T01:46:35Z)
Isomer: Isomerous Transformer for Zero-shot Video Object Segmentation [59.91357714415056]
We propose two Transformer variants: Context-Sharing Transformer (CST) and Semantic Gathering-Scattering Transformer (S GST) CST learns the global-shared contextual information within image frames with a lightweight computation; S GST models the semantic correlation separately for the foreground and background. Compared with the baseline that uses vanilla Transformers for multi-stage fusion, ours significantly increase the speed by 13 times and achieves new state-of-the-art ZVOS performance.
arXiv Detail & Related papers (2023-08-13T06:12:00Z)
CageViT: Convolutional Activation Guided Efficient Vision Transformer [90.69578999760206]
This paper presents an efficient vision Transformer, called CageViT, that is guided by convolutional activation to reduce computation. Our CageViT, unlike current Transformers, utilizes a new encoder to handle the rearranged tokens. Experimental results demonstrate that the proposed CageViT outperforms the most recent state-of-the-art backbones by a large margin in terms of efficiency.
arXiv Detail & Related papers (2023-05-17T03:19:18Z)
AccelTran: A Sparsity-Aware Accelerator for Dynamic Inference with Transformers [6.0093441900032465]
Self-attention-based transformer models have achieved tremendous success in the domain of natural language processing. Previous works directly operate on large matrices involved in the attention operation, which limits hardware utilization. We propose a novel dynamic inference scheme, DynaTran, which prunes activations at runtime with low overhead.
arXiv Detail & Related papers (2023-02-28T16:17:23Z)
Reducing SO(3) Convolutions to SO(2) for Efficient Equivariant GNNs [3.1618838742094457]
equivariant convolutions increase significantly in computational complexity as higher-order tensors are used. We propose a graph neural network utilizing our novel approach to equivariant convolutions, which achieves state-of-the-art results on the large-scale OC-20 and OC-22 datasets.
arXiv Detail & Related papers (2023-02-07T18:16:13Z)
Equiformer: Equivariant Graph Attention Transformer for 3D Atomistic Graphs [3.0603554929274908]
3D-related inductive biases are indispensable to graph neural networks operating on 3D atomistic graphs such as molecules. Inspired by the success of Transformers in various domains, we study how to incorporate these inductive biases into Transformers. We present Equiformer, a graph neural network leveraging the strength of Transformer architectures.
arXiv Detail & Related papers (2022-06-23T21:40:37Z)
Residual Mixture of Experts [75.5489156421442]
Residual Mixture of Experts (RMoE) is an efficient training pipeline for MoE vision transformers on downstream tasks. RMoE achieves comparable results with the upper-bound MoE training, while only introducing minor additional training cost.
arXiv Detail & Related papers (2022-04-20T17:29:48Z)
Ensemble Transformer for Efficient and Accurate Ranking Tasks: an Application to Question Answering Systems [99.13795374152997]
We propose a neural network designed to distill an ensemble of large transformers into a single smaller model. An MHS model consists of two components: a stack of transformer layers that is used to encode inputs, and a set of ranking heads. Unlike traditional distillation techniques, our approach leverages individual models in ensemble as teachers in a way that preserves the diversity of the ensemble members.
arXiv Detail & Related papers (2022-01-15T06:21:01Z)
Energon: Towards Efficient Acceleration of Transformers Using Dynamic Sparse Attention [5.495006023171481]
transformer models have revolutionized Natural Language Processing (NLP) and also show promising performance on Computer Vision (CV) tasks. We propose Energon, an algorithm-architecture co-design approach that accelerates various transformers using dynamic sparse attention. We demonstrate that Energon achieves $161times$ and $8.4times$ geo-mean speedup and up to $104times$ and $103times$ energy reduction.
arXiv Detail & Related papers (2021-10-18T13:42:43Z)
SE(3)-Transformers: 3D Roto-Translation Equivariant Attention Networks [71.55002934935473]
We introduce the SE(3)-Transformer, a variant of the self-attention module for 3D point clouds and graphs, which is equivariant under continuous 3D roto-translations. We evaluate our model on a toy N-body particle simulation dataset, showcasing the robustness of the predictions under rotations of the input.
arXiv Detail & Related papers (2020-06-18T13:23:01Z)

This list is automatically generated from the titles and abstracts of the papers in this site.