Transformers Discover Molecular Structure Without Graph Priors
- URL: http://arxiv.org/abs/2510.02259v1
- Date: Thu, 02 Oct 2025 17:42:10 GMT
- Title: Transformers Discover Molecular Structure Without Graph Priors
- Authors: Tobias Kreiman, Yutong Bai, Fadi Atieh, Elizabeth Weaver, Eric Qu, Aditi S. Krishnapriyan,
- Abstract summary: Graph Networks (GNNs) are the architecture for molecular machine learning.<n>In this work, we show how to train a Transformer to approximate molecular energies and forces.<n>Our results demonstrate that many favorable properties of GNNs can emerge adaptively in Transformers.
- Score: 17.28814700264642
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Graph Neural Networks (GNNs) are the dominant architecture for molecular machine learning, particularly for molecular property prediction and machine learning interatomic potentials (MLIPs). GNNs perform message passing on predefined graphs often induced by a fixed radius cutoff or k-nearest neighbor scheme. While this design aligns with the locality present in many molecular tasks, a hard-coded graph can limit expressivity due to the fixed receptive field and slows down inference with sparse graph operations. In this work, we investigate whether pure, unmodified Transformers trained directly on Cartesian coordinates$\unicode{x2013}$without predefined graphs or physical priors$\unicode{x2013}$can approximate molecular energies and forces. As a starting point for our analysis, we demonstrate how to train a Transformer to competitive energy and force mean absolute errors under a matched training compute budget, relative to a state-of-the-art equivariant GNN on the OMol25 dataset. We discover that the Transformer learns physically consistent patterns$\unicode{x2013}$such as attention weights that decay inversely with interatomic distance$\unicode{x2013}$and flexibly adapts them across different molecular environments due to the absence of hard-coded biases. The use of a standard Transformer also unlocks predictable improvements with respect to scaling training resources, consistent with empirical scaling laws observed in other domains. Our results demonstrate that many favorable properties of GNNs can emerge adaptively in Transformers, challenging the necessity of hard-coded graph inductive biases and pointing toward standardized, scalable architectures for molecular modeling.
Related papers
- Plain Transformers are Surprisingly Powerful Link Predictors [57.01966734467712]
Link prediction is a core challenge in graph machine learning, demanding models that capture rich and complex topological dependencies.<n>While Graph Neural Networks (GNNs) are the standard solution, state-of-the-art pipelines often rely on explicit structurals or memory-intensive node embeddings.<n>We present PENCIL, an encoder-only plain Transformer that replaces hand-crafted priors with attention over sampled local subgraphs.
arXiv Detail & Related papers (2026-02-02T02:45:52Z) - A standard transformer and attention with linear biases for molecular conformer generation [0.16741831494720966]
equivariant networks have been designed to generate molecular conformations from 2D molecular graphs.<n>Non-equivariant transformer models have emerged as a viable alternative due to their capability to scale to improve generalization.<n>In this paper, we demonstrate that a well-chosen positional encoding effectively addresses these size limitations.
arXiv Detail & Related papers (2025-06-24T17:50:49Z) - GITO: Graph-Informed Transformer Operator for Learning Complex Partial Differential Equations [0.0]
We present a novel graph-informed transformer operator (GITO) architecture for learning complex partial differential equation systems.<n>GITO consists of two main modules: a hybrid graph transformer (HGT) and a transformer neural operator (TNO)<n> Empirical results on benchmark PDE tasks demonstrate that GITO outperforms existing transformer-based neural operators.
arXiv Detail & Related papers (2025-06-16T18:35:45Z) - GraphXForm: Graph transformer for computer-aided molecular design [73.1842164721868]
We present GraphXForm, a decoder-only graph transformer architecture, which is pretrained on existing compounds.<n>We evaluate it on various drug design tasks, demonstrating superior objective scores compared to state-of-the-art molecular design approaches.
arXiv Detail & Related papers (2024-11-03T19:45:15Z) - Cell Graph Transformer for Nuclei Classification [78.47566396839628]
We develop a cell graph transformer (CGT) that treats nodes and edges as input tokens to enable learnable adjacency and information exchange among all nodes.
Poorly features can lead to noisy self-attention scores and inferior convergence.
We propose a novel topology-aware pretraining method that leverages a graph convolutional network (GCN) to learn a feature extractor.
arXiv Detail & Related papers (2024-02-20T12:01:30Z) - Graph Transformers without Positional Encodings [0.7252027234425334]
We introduce Eigenformer, a Graph Transformer employing a novel spectrum-aware attention mechanism cognizant of the Laplacian spectrum of the graph.
We empirically show that it achieves performance competetive with SOTA Graph Transformers on a number of standard GNN benchmarks.
arXiv Detail & Related papers (2024-01-31T12:33:31Z) - Supercharging Graph Transformers with Advective Diffusion [28.40109111316014]
This paper proposes Advective Diffusion Transformer (AdvDIFFormer), a physics-inspired graph Transformer model designed to address this challenge.<n>We show that AdvDIFFormer has provable capability for controlling generalization error with topological shifts.<n> Empirically, the model demonstrates superiority in various predictive tasks across information networks, molecular screening and protein interactions.
arXiv Detail & Related papers (2023-10-10T08:40:47Z) - Curve Your Attention: Mixed-Curvature Transformers for Graph
Representation Learning [77.1421343649344]
We propose a generalization of Transformers towards operating entirely on the product of constant curvature spaces.
We also provide a kernelized approach to non-Euclidean attention, which enables our model to run in time and memory cost linear to the number of nodes and edges.
arXiv Detail & Related papers (2023-09-08T02:44:37Z) - CTAGE: Curvature-Based Topology-Aware Graph Embedding for Learning
Molecular Representations [11.12640831521393]
We propose an embedding approach CTAGE, utilizing $k$-hop discrete Ricci curvature to extract structural insights from molecular graph data.
Results indicate that introducing node curvature significantly improves the performance of current graph neural network frameworks.
arXiv Detail & Related papers (2023-07-25T06:13:01Z) - Pure Transformers are Powerful Graph Learners [51.36884247453605]
We show that standard Transformers without graph-specific modifications can lead to promising results in graph learning both in theory and practice.
We prove that this approach is theoretically at least as expressive as an invariant graph network (2-IGN) composed of equivariant linear layers.
Our method coined Tokenized Graph Transformer (TokenGT) achieves significantly better results compared to GNN baselines and competitive results.
arXiv Detail & Related papers (2022-07-06T08:13:06Z) - Transferability Properties of Graph Neural Networks [125.71771240180654]
Graph neural networks (GNNs) are provably successful at learning representations from data supported on moderate-scale graphs.
We study the problem of training GNNs on graphs of moderate size and transferring them to large-scale graphs.
Our results show that (i) the transference error decreases with the graph size, and (ii) that graph filters have a transferability-discriminability tradeoff that in GNNs is alleviated by the scattering behavior of the nonlinearity.
arXiv Detail & Related papers (2021-12-09T00:08:09Z) - Image-Like Graph Representations for Improved Molecular Property
Prediction [7.119677737397071]
We propose a new intrinsic molecular representation that bypasses the need for GNNs entirely, dubbed CubeMol.
Our fixed-dimensional representation, when paired with a transformer model, exceeds the performance of state-of-the-art GNN models and provides a path for scalability.
arXiv Detail & Related papers (2021-11-20T22:39:11Z) - Geometric Transformer for End-to-End Molecule Properties Prediction [92.28929858529679]
We introduce a Transformer-based architecture for molecule property prediction, which is able to capture the geometry of the molecule.
We modify the classical positional encoder by an initial encoding of the molecule geometry, as well as a learned gated self-attention mechanism.
arXiv Detail & Related papers (2021-10-26T14:14:40Z) - GeoT: A Geometry-aware Transformer for Reliable Molecular Property
Prediction and Chemically Interpretable Representation Learning [16.484048833163282]
We introduce a novel Transformer-based framework for molecular representation learning, named the Geometry-aware Transformer (GeoT)
GeoT learns molecular graph structures through attention-based mechanisms specifically designed to offer reliable interpretability, as well as molecular property prediction.
Our comprehensive experiments, including an empirical simulation, reveal that GeoT effectively learns the chemical insights into molecular structures, bridging the gap between artificial intelligence and molecular sciences.
arXiv Detail & Related papers (2021-06-29T15:47:18Z) - GeoMol: Torsional Geometric Generation of Molecular 3D Conformer
Ensembles [60.12186997181117]
Prediction of a molecule's 3D conformer ensemble from the molecular graph holds a key role in areas of cheminformatics and drug discovery.
Existing generative models have several drawbacks including lack of modeling important molecular geometry elements.
We propose GeoMol, an end-to-end, non-autoregressive and SE(3)-invariant machine learning approach to generate 3D conformers.
arXiv Detail & Related papers (2021-06-08T14:17:59Z) - Rethinking Graph Transformers with Spectral Attention [13.068288784805901]
We present the $textitSpectral Attention Network$ (SAN), which uses a learned positional encoding (LPE) to learn the position of each node in a given graph.
By leveraging the full spectrum of the Laplacian, our model is theoretically powerful in distinguishing graphs, and can better detect similar sub-structures from their resonance.
Our model performs on par or better than state-of-the-art GNNs, and outperforms any attention-based model by a wide margin.
arXiv Detail & Related papers (2021-06-07T18:11:11Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.