Circuits, Features, and Heuristics in Molecular Transformers
- URL: http://arxiv.org/abs/2512.09757v1
- Date: Wed, 10 Dec 2025 15:35:22 GMT
- Title: Circuits, Features, and Heuristics in Molecular Transformers
- Authors: Kristof Varadi, Mark Marosi, Peter Antal,
- Abstract summary: We present a mechanistic analysis of autoregressive transformers trained on drug-like small molecules.<n>We identify computational patterns consistent with low-level syntactic parsing and more abstract chemical validity constraints.
- Score: 0.056179939237156
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Transformers generate valid and diverse chemical structures, but little is known about the mechanisms that enable these models to capture the rules of molecular representation. We present a mechanistic analysis of autoregressive transformers trained on drug-like small molecules to reveal the computational structure underlying their capabilities across multiple levels of abstraction. We identify computational patterns consistent with low-level syntactic parsing and more abstract chemical validity constraints. Using sparse autoencoders (SAEs), we extract feature dictionaries associated with chemically relevant activation patterns. We validate our findings on downstream tasks and find that mechanistic insights can translate to predictive performance in various practical settings.
Related papers
- Transformer-Based Approach for Automated Functional Group Replacement in Chemical Compounds [12.414301421345227]
We develop a novel two-stage transformer model for functional group removal and replacement.<n>Unlike one-shot approaches that generate entire molecules in a single pass, our method generates the functional group to be removed and appended sequentially.
arXiv Detail & Related papers (2026-01-12T19:01:11Z) - Enhancing Chemical Explainability Through Counterfactual Masking [1.1024591739346294]
We propose counterfactual masking, a framework that replaces masked substructures with chemically reasonable fragments.<n>Our approach bridges the gap between explainability and molecular design, offering a principled and generative path toward explainable machine learning in chemistry.
arXiv Detail & Related papers (2025-08-25T23:41:36Z) - DiffMS: Diffusion Generation of Molecules Conditioned on Mass Spectra [60.39311767532607]
We present DiffMS, a formula-restricted encoder-decoder generative network that achieves state-of-the-art performance on this task.<n>To develop a robust decoder that bridges latent embeddings and molecular structures, we pretrain the diffusion decoder with fingerprint-structure pairs.<n>Experiments on established benchmarks show that DiffMS outperforms existing models on de novo molecule generation.
arXiv Detail & Related papers (2025-02-13T18:29:48Z) - GraphXForm: Graph transformer for computer-aided molecular design [73.1842164721868]
We present GraphXForm, a decoder-only graph transformer architecture, which is pretrained on existing compounds.<n>We evaluate it on various drug design tasks, demonstrating superior objective scores compared to state-of-the-art molecular design approaches.
arXiv Detail & Related papers (2024-11-03T19:45:15Z) - Pre-trained Molecular Language Models with Random Functional Group Masking [54.900360309677794]
We propose a SMILES-based underlineem Molecular underlineem Language underlineem Model, which randomly masking SMILES subsequences corresponding to specific molecular atoms.
This technique aims to compel the model to better infer molecular structures and properties, thus enhancing its predictive capabilities.
arXiv Detail & Related papers (2024-11-03T01:56:15Z) - Text-Guided Multi-Property Molecular Optimization with a Diffusion Language Model [20.250683535089617]
We propose a text-guided multi-property molecular optimization method utilizing transformer-based diffusion language model (TransDLM)<n>By fusing physically and chemically detailed semantics with specialized molecular representations, TransDLM effectively integrates diverse information sources to guide precise optimization.
arXiv Detail & Related papers (2024-10-17T14:30:27Z) - Molecule Design by Latent Prompt Transformer [76.2112075557233]
This work explores the challenging problem of molecule design by framing it as a conditional generative modeling task.
We propose a novel generative model comprising three components: (1) a latent vector with a learnable prior distribution; (2) a molecule generation model based on a causal Transformer, which uses the latent vector as a prompt; and (3) a property prediction model that predicts a molecule's target properties and/or constraint values using the latent prompt.
arXiv Detail & Related papers (2024-02-27T03:33:23Z) - AdaMR: Adaptable Molecular Representation for Unified Pre-training Strategy [11.710702202071573]
We propose a new large-scale uniform pre-training strategy for small-molecule drugs, called Molecular Adjustable Representation (AdaMR)
AdaMR utilizes a granularity-adjustable molecular encoding strategy, which is accomplished through a pre-training job termed molecular canonicalization.
We fine-tuned our proposed pre-trained model on six molecular property prediction tasks and two generative tasks, achieving state-of-the-art (SOTA) results on five out of eight tasks.
arXiv Detail & Related papers (2023-12-28T10:53:17Z) - Implicit Geometry and Interaction Embeddings Improve Few-Shot Molecular
Property Prediction [53.06671763877109]
We develop molecular embeddings that encode complex molecular characteristics to improve the performance of few-shot molecular property prediction.
Our approach leverages large amounts of synthetic data, namely the results of molecular docking calculations.
On multiple molecular property prediction benchmarks, training from the embedding space substantially improves Multi-Task, MAML, and Prototypical Network few-shot learning performance.
arXiv Detail & Related papers (2023-02-04T01:32:40Z) - Semi-Supervised Junction Tree Variational Autoencoder for Molecular
Property Prediction [0.0]
This research modifies state-of-the-art molecule generation method - Junction Tree Variational Autoencoder (JT-VAE) to facilitate semi-supervised learning on chemical property prediction.
We leverage JT-VAE architecture to learn an interpretable representation optimal for tasks ranging from molecule property prediction to conditional molecule generation.
arXiv Detail & Related papers (2022-08-10T03:06:58Z) - Automatic Identification of Chemical Moieties [11.50343898633327]
We introduce a method to automatically identify chemical moieties from atomic representations using message-passing neural networks.
The versatility of our approach is demonstrated by enabling the selection of representative entries in chemical databases.
arXiv Detail & Related papers (2022-03-30T10:58:23Z) - Geometric Transformer for End-to-End Molecule Properties Prediction [92.28929858529679]
We introduce a Transformer-based architecture for molecule property prediction, which is able to capture the geometry of the molecule.
We modify the classical positional encoder by an initial encoding of the molecule geometry, as well as a learned gated self-attention mechanism.
arXiv Detail & Related papers (2021-10-26T14:14:40Z) - Do Large Scale Molecular Language Representations Capture Important
Structural Information? [31.76876206167457]
We present molecular embeddings obtained by training an efficient transformer encoder model, referred to as MoLFormer.
Experiments show that the learned molecular representation performs competitively, when compared to graph-based and fingerprint-based supervised learning baselines.
arXiv Detail & Related papers (2021-06-17T14:33:55Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.