FragmentNet: Adaptive Graph Fragmentation for Graph-to-Sequence Molecular Representation Learning
- URL: http://arxiv.org/abs/2502.01184v1
- Date: Mon, 03 Feb 2025 09:21:49 GMT
- Title: FragmentNet: Adaptive Graph Fragmentation for Graph-to-Sequence Molecular Representation Learning
- Authors: Ankur Samanta, Rohan Gupta, Aditi Misra, Christian McIntosh Clarke, Jayakumar Rajadas,
- Abstract summary: We introduce FragmentNet, a graph-to-sequence foundation model with an adaptive, learned tokenizer that decomposes molecular graphs into chemically valid fragments.
FragmentNet integrates VQVAE-GCN for hierarchical fragment embeddings, spatial positional encodings for graph serialization, global molecular descriptors, and a transformer.
- Score: 0.09986418756990159
- License:
- Abstract: Molecular property prediction uses molecular structure to infer chemical properties. Chemically interpretable representations that capture meaningful intramolecular interactions enhance the usability and effectiveness of these predictions. However, existing methods often rely on atom-based or rule-based fragment tokenization, which can be chemically suboptimal and lack scalability. We introduce FragmentNet, a graph-to-sequence foundation model with an adaptive, learned tokenizer that decomposes molecular graphs into chemically valid fragments while preserving structural connectivity. FragmentNet integrates VQVAE-GCN for hierarchical fragment embeddings, spatial positional encodings for graph serialization, global molecular descriptors, and a transformer. Pre-trained with Masked Fragment Modeling and fine-tuned on MoleculeNet tasks, FragmentNet outperforms models with similarly scaled architectures and datasets while rivaling larger state-of-the-art models requiring significantly more resources. This novel framework enables adaptive decomposition, serialization, and reconstruction of molecular graphs, facilitating fragment-based editing and visualization of property trends in learned embeddings - a powerful tool for molecular design and optimization.
Related papers
- GraphXForm: Graph transformer for computer-aided molecular design with application to extraction [73.1842164721868]
We present GraphXForm, a decoder-only graph transformer architecture, which is pretrained on existing compounds and then fine-tuned.
We evaluate it on two solvent design tasks for liquid-liquid extraction, showing that it outperforms four state-of-the-art molecular design techniques.
arXiv Detail & Related papers (2024-11-03T19:45:15Z) - Molecular Graph Representation Learning via Structural Similarity Information [11.38130169319915]
We introduce the textbf Structural Similarity Motif GNN (MSSM-GNN), a novel molecular graph representation learning method.
In particular, we propose a specially designed graph that leverages graph kernel algorithms to represent the similarity between molecules quantitatively.
We employ GNNs to learn feature representations from molecular graphs, aiming to enhance the accuracy of property prediction by incorporating additional molecular representation information.
arXiv Detail & Related papers (2024-09-13T06:59:10Z) - Bi-level Contrastive Learning for Knowledge-Enhanced Molecule Representations [68.32093648671496]
We introduce GODE, which accounts for the dual-level structure inherent in molecules.
Molecules possess an intrinsic graph structure and simultaneously function as nodes within a broader molecular knowledge graph.
By pre-training two GNNs on different graph structures, GODE effectively fuses molecular structures with their corresponding knowledge graph substructures.
arXiv Detail & Related papers (2023-06-02T15:49:45Z) - Implicit Geometry and Interaction Embeddings Improve Few-Shot Molecular
Property Prediction [53.06671763877109]
We develop molecular embeddings that encode complex molecular characteristics to improve the performance of few-shot molecular property prediction.
Our approach leverages large amounts of synthetic data, namely the results of molecular docking calculations.
On multiple molecular property prediction benchmarks, training from the embedding space substantially improves Multi-Task, MAML, and Prototypical Network few-shot learning performance.
arXiv Detail & Related papers (2023-02-04T01:32:40Z) - t-SMILES: A Scalable Fragment-based Molecular Representation Framework for De Novo Molecule Generation [9.116670221263753]
This study introduces a flexible, fragment-based, multiscale molecular representation framework called t-SMILES.
It describes molecules using SMILES-type strings obtained by performing a breadth-first search on a full binary tree formed from a fragmented molecular graph.
It significantly outperforms classical SMILES, DeepSMILES, SELFIES and baseline models in goal-directed tasks.
arXiv Detail & Related papers (2023-01-04T21:41:01Z) - Graph neural networks for the prediction of molecular structure-property
relationships [59.11160990637615]
Graph neural networks (GNNs) are a novel machine learning method that directly work on the molecular graph.
GNNs allow to learn properties in an end-to-end fashion, thereby avoiding the need for informative descriptors.
We describe the fundamentals of GNNs and demonstrate the application of GNNs via two examples for molecular property prediction.
arXiv Detail & Related papers (2022-07-25T11:30:44Z) - Molecular Graph Generation via Geometric Scattering [7.796917261490019]
Graph neural networks (GNNs) have been used extensively for addressing problems in drug design and discovery.
We propose a representation-first approach to molecular graph generation.
We show that our architecture learns meaningful representations of drug datasets and provides a platform for goal-directed drug synthesis.
arXiv Detail & Related papers (2021-10-12T18:00:23Z) - Learning Attributed Graph Representations with Communicative Message
Passing Transformer [3.812358821429274]
We propose a Communicative Message Passing Transformer (CoMPT) neural network to improve the molecular graph representation.
Unlike the previous transformer-style GNNs that treat molecules as fully connected graphs, we introduce a message diffusion mechanism to leverage the graph connectivity inductive bias.
arXiv Detail & Related papers (2021-07-19T11:58:32Z) - Reinforced Molecular Optimization with Neighborhood-Controlled Grammars [63.84003497770347]
We propose MNCE-RL, a graph convolutional policy network for molecular optimization.
We extend the original neighborhood-controlled embedding grammars to make them applicable to molecular graph generation.
We show that our approach achieves state-of-the-art performance in a diverse range of molecular optimization tasks.
arXiv Detail & Related papers (2020-11-14T05:42:15Z) - Multi-View Graph Neural Networks for Molecular Property Prediction [67.54644592806876]
We present Multi-View Graph Neural Network (MV-GNN), a multi-view message passing architecture.
In MV-GNN, we introduce a shared self-attentive readout component and disagreement loss to stabilize the training process.
We further boost the expressive power of MV-GNN by proposing a cross-dependent message passing scheme.
arXiv Detail & Related papers (2020-05-17T04:46:07Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.