Related papers: E2Former-V2: On-the-Fly Equivariant Attention with Linear Activation Memory

E2Former-V2: On-the-Fly Equivariant Attention with Linear Activation Memory

URL: http://arxiv.org/abs/2601.16622v1
Date: Fri, 23 Jan 2026 10:20:08 GMT
Title: E2Former-V2: On-the-Fly Equivariant Attention with Linear Activation Memory
Authors: Lin Huang, Chengxiang Huang, Ziang Wang, Yiyue Du, Chu Wang, Haocheng Lu, Yunyang Li, Xiaoli Liu, Arthur Jiang, Jia Zhang,
Abstract summary: Equivariant Graph Neural Networks (EGNNs) have become a widely used approach for modeling 3D atomistic systems.<n>We introduce textbfE2Former-V2, a scalable architecture that integrates algebraic sparsity with hardware-aware execution.<n>E2Former-V2 maintains comparable predictive performance while notably accelerating inference.
Score: 13.451231889715542
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Equivariant Graph Neural Networks (EGNNs) have become a widely used approach for modeling 3D atomistic systems. However, mainstream architectures face critical scalability bottlenecks due to the explicit construction of geometric features or dense tensor products on \textit{every} edge. To overcome this, we introduce \textbf{E2Former-V2}, a scalable architecture that integrates algebraic sparsity with hardware-aware execution. We first propose \textbf{E}quivariant \textbf{A}xis-\textbf{A}ligned \textbf{S}parsification (EAAS). EAAS builds on Wigner-$6j$ convolution by exploiting an $\mathrm{SO}(3) \rightarrow \mathrm{SO}(2)$ change of basis to transform computationally expensive dense tensor contractions into efficient, sparse parity re-indexing operations. Building on this representation, we introduce \textbf{On-the-Fly Equivariant Attention}, a fully node-centric mechanism implemented via a custom fused Triton kernel. By eliminating materialized edge tensors and maximizing SRAM utilization, our kernel achieves a \textbf{20$\times$ improvement in TFLOPS} compared to standard implementations. Extensive experiments on the SPICE and OMol25 datasets demonstrate that E2Former-V2 maintains comparable predictive performance while notably accelerating inference. This work demonstrates that large equivariant transformers can be trained efficiently using widely accessible GPU platforms. The code is avalible at https://github.com/IQuestLab/UBio-MolFM/tree/e2formerv2.

Related papers

VecFormer: Towards Efficient and Generalizable Graph Transformer with Graph Token Attention [61.96837866507746]
VecFormer is an efficient and highly generalizable model for node classification.<n>VecFormer outperforms the existing Graph Transformer in both performance and speed.
arXiv Detail & Related papers (2026-02-23T09:10:39Z)
OpenInsGaussian: Open-vocabulary Instance Gaussian Segmentation with Context-aware Cross-view Fusion [89.98812408058336]
We introduce textbfOpenInsGaussian, an textbfOpen-vocabulary textbfInstance textbfGaussian segmentation framework with Context-aware Cross-view Fusion.<n>OpenInsGaussian achieves state-of-the-art results in open-vocabulary 3D Gaussian segmentation, outperforming existing baselines by a large margin.
arXiv Detail & Related papers (2025-10-21T03:24:12Z)
PT$^2$-LLM: Post-Training Ternarization for Large Language Models [52.4629647715623]
Large Language Models (LLMs) have shown impressive capabilities across diverse tasks, but their large memory and compute demands hinder deployment.<n>We propose PT$2$-LLM, a post-training ternarization framework tailored for LLMs.<n>At its core is an Asymmetric Ternary Quantizer equipped with a two-stage refinement pipeline.
arXiv Detail & Related papers (2025-09-27T03:01:48Z)
Efficient Prediction of SO(3)-Equivariant Hamiltonian Matrices via SO(2) Local Frames [49.1851978742043]
We consider the task of predicting Hamiltonian matrices to accelerate electronic structure calculations.<n>Motivated by the inherent relationship between the off-diagonal blocks of the Hamiltonian matrix and the SO(2) local frame, we propose QHNetV2.
arXiv Detail & Related papers (2025-06-11T05:04:29Z)
E2Former: An Efficient and Equivariant Transformer with Linear-Scaling Tensor Products [30.856584261032207]
We introduce E2Former, an equivariant and efficient transformer architecture that incorporates the Wigner $6j$ convolution (Wigner $6j$ Conv)<n>By shifting the computational burden from edges to nodes, the Wigner $6j$ Conv reduces the complexity from $O(|mathcalE|)$ to $ O(| mathcalV|)$ while preserving both the model's expressive power and rotational equivariance.<n>This development could suggest a promising direction for scalable and efficient molecular modeling.
arXiv Detail & Related papers (2025-01-31T15:22:58Z)
An Efficient Sparse Kernel Generator for O(3)-Equivariant Deep Networks [0.5737287537823071]
Rotation equivariant graph neural networks yield state of the art performance on spatial deep learning tasks.<n>Key to these models is the Clebsch-Gordon (CG) tensor product, a kernel that contracts two dense feature vectors with a highly-structured sparse tensor to produce a dense output vector.<n>We introduce a GPU sparse kernel generator for the CG tensor product that provides significant speedups over the best existing open and closed-source implementations.
arXiv Detail & Related papers (2025-01-23T08:20:47Z)
Geometric Algebra Planes: Convex Implicit Neural Volumes [70.12234371845445]
We show that GA-Planes is equivalent to a sparse low-rank factor plus low-resolution matrix. We also show that GA-Planes can be adapted for many existing representations.
arXiv Detail & Related papers (2024-11-20T18:21:58Z)
Rethinking SO(3)-equivariance with Bilinear Tensor Networks [0.0]
We show that by judicious symmetry breaking, we can efficiently increase the expressiveness of a network operating only on vector and order-2 tensor representations of SO$(2)$. We demonstrate the method on an important problem from High Energy Physics known as textitb-tagging, where particle jets originating from b-meson decays must be discriminated from an overwhelming QCD background.
arXiv Detail & Related papers (2023-03-20T17:23:15Z)
Reducing SO(3) Convolutions to SO(2) for Efficient Equivariant GNNs [3.1618838742094457]
equivariant convolutions increase significantly in computational complexity as higher-order tensors are used. We propose a graph neural network utilizing our novel approach to equivariant convolutions, which achieves state-of-the-art results on the large-scale OC-20 and OC-22 datasets.
arXiv Detail & Related papers (2023-02-07T18:16:13Z)
2D+3D facial expression recognition via embedded tensor manifold regularization [16.98176664818354]
A novel approach via embedded tensor manifold regularization for 2D+3D facial expression recognition (FERETMR) is proposed. We establish the first-order optimality condition in terms of stationary points, and then design a block coordinate descent (BCD) algorithm with convergence analysis. Numerical results on BU-3DFE database and Bosphorus databases demonstrate the effectiveness of our proposed approach.
arXiv Detail & Related papers (2022-01-29T06:11:00Z)
VersaGNN: a Versatile accelerator for Graph neural networks [81.1667080640009]
We propose textitVersaGNN, an ultra-efficient, systolic-array-based versatile hardware accelerator. textitVersaGNN achieves on average 3712$times$ speedup with 1301.25$times$ energy reduction on CPU, and 35.4$times$ speedup with 17.66$times$ energy reduction on GPU.
arXiv Detail & Related papers (2021-05-04T04:10:48Z)

This list is automatically generated from the titles and abstracts of the papers in this site.