Hierarchical Molecular Representation Learning via Fragment-Based Self-Supervised Embedding Prediction
- URL: http://arxiv.org/abs/2602.20344v1
- Date: Mon, 23 Feb 2026 20:41:44 GMT
- Title: Hierarchical Molecular Representation Learning via Fragment-Based Self-Supervised Embedding Prediction
- Authors: Jiele Wu, Haozhe Ma, Zhihan Guo, Thanh Vinh Vo, Tze Yun Leong,
- Abstract summary: Graph self-supervised learning (GSSL) has demonstrated strong potential for generating expressive graph embeddings without the need for human annotations.<n>GraSPNet is a hierarchical self-supervised framework that explicitly models both atomic-level and fragment-level semantics.<n>GraSPNet learns chemically meaningful representations and consistently outperforms state-of-the-art GSSL methods in transfer learning settings.
- Score: 7.459632891054827
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Graph self-supervised learning (GSSL) has demonstrated strong potential for generating expressive graph embeddings without the need for human annotations, making it particularly valuable in domains with high labeling costs such as molecular graph analysis. However, existing GSSL methods mostly focus on node- or edge-level information, often ignoring chemically relevant substructures which strongly influence molecular properties. In this work, we propose Graph Semantic Predictive Network (GraSPNet), a hierarchical self-supervised framework that explicitly models both atomic-level and fragment-level semantics. GraSPNet decomposes molecular graphs into chemically meaningful fragments without predefined vocabularies and learns node- and fragment-level representations through multi-level message passing with masked semantic prediction at both levels. This hierarchical semantic supervision enables GraSPNet to learn multi-resolution structural information that is both expressive and transferable. Extensive experiments on multiple molecular property prediction benchmarks demonstrate that GraSPNet learns chemically meaningful representations and consistently outperforms state-of-the-art GSSL methods in transfer learning settings.
Related papers
- ProtoMol: Enhancing Molecular Property Prediction via Prototype-Guided Multimodal Learning [14.289447310645878]
ProtoMol is a prototype-guided framework that enables fine-grained integration and consistent semantic alignment between modalities.<n>ProtoMol consistently outperforms state-of-the-art baselines across a variety of molecular property prediction tasks.
arXiv Detail & Related papers (2025-10-19T13:19:37Z) - Understanding the Capabilities of Molecular Graph Neural Networks in Materials Science Through Multimodal Learning and Physical Context Encoding [2.172419551358714]
Molecular graph neural networks (GNNs) often focus exclusively on XYZ-based geometric representations.<n>This work introduces a multimodal framework that integrates textual descriptors, such as IUPAC names, molecular formulas, physicochemical properties, and synonyms.
arXiv Detail & Related papers (2025-05-17T20:42:16Z) - Beyond Message Passing: Neural Graph Pattern Machine [50.78679002846741]
We introduce the Neural Graph Pattern Machine (GPM), a novel framework that bypasses message passing by learning directly from graph substructures.<n>GPM efficiently extracts, encodes, and prioritizes task-relevant graph patterns, offering greater expressivity and improved ability to capture long-range dependencies.
arXiv Detail & Related papers (2025-01-30T20:37:47Z) - A Pure Transformer Pretraining Framework on Text-attributed Graphs [50.833130854272774]
We introduce a feature-centric pretraining perspective by treating graph structure as a prior.
Our framework, Graph Sequence Pretraining with Transformer (GSPT), samples node contexts through random walks.
GSPT can be easily adapted to both node classification and link prediction, demonstrating promising empirical success on various datasets.
arXiv Detail & Related papers (2024-06-19T22:30:08Z) - Contrastive Dual-Interaction Graph Neural Network for Molecular Property Prediction [0.0]
We introduce DIG-Mol, a novel self-supervised graph neural network framework for molecular property prediction.
DIG-Mol integrates a momentum distillation network with two interconnected networks to efficiently improve molecular characterization.
We have established DIG-Mol's state-of-the-art performance through extensive experimental evaluation in a variety of molecular property prediction tasks.
arXiv Detail & Related papers (2024-05-04T10:09:27Z) - GIMLET: A Unified Graph-Text Model for Instruction-Based Molecule
Zero-Shot Learning [71.89623260998934]
This study investigates the feasibility of employing natural language instructions to accomplish molecule-related tasks in a zero-shot setting.
Existing molecule-text models perform poorly in this setting due to inadequate treatment of instructions and limited capacity for graphs.
We propose GIMLET, which unifies language models for both graph and text data.
arXiv Detail & Related papers (2023-05-28T18:27:59Z) - Motif-based Graph Representation Learning with Application to Chemical
Molecules [11.257235936629689]
Existing graph neural networks offer limited ability to capture complex interactions within local structural contexts.
We propose a new motif-based graph representation learning technique to better utilize local structural information.
MCM builds a motif vocabulary in an unsupervised way and deploys a novel motif convolution operation to extract the local structural context.
arXiv Detail & Related papers (2022-08-09T03:37:37Z) - Motif-based Graph Self-Supervised Learning forMolecular Property
Prediction [12.789013658551454]
Graph Neural Networks (GNNs) have demonstrated remarkable success in various molecular generation and prediction tasks.
Most existing self-supervised pre-training frameworks for GNNs only focus on node-level or graph-level tasks.
We propose a novel self-supervised motif generation framework for GNNs.
arXiv Detail & Related papers (2021-10-03T11:45:51Z) - Self-supervised Graph-level Representation Learning with Local and
Global Structure [71.45196938842608]
We propose a unified framework called Local-instance and Global-semantic Learning (GraphLoG) for self-supervised whole-graph representation learning.
Besides preserving the local similarities, GraphLoG introduces the hierarchical prototypes to capture the global semantic clusters.
An efficient online expectation-maximization (EM) algorithm is further developed for learning the model.
arXiv Detail & Related papers (2021-06-08T05:25:38Z) - Learning the Implicit Semantic Representation on Graph-Structured Data [57.670106959061634]
Existing representation learning methods in graph convolutional networks are mainly designed by describing the neighborhood of each node as a perceptual whole.
We propose a Semantic Graph Convolutional Networks (SGCN) that explores the implicit semantics by learning latent semantic-paths in graphs.
arXiv Detail & Related papers (2021-01-16T16:18:43Z) - Multi-View Graph Neural Networks for Molecular Property Prediction [67.54644592806876]
We present Multi-View Graph Neural Network (MV-GNN), a multi-view message passing architecture.
In MV-GNN, we introduce a shared self-attentive readout component and disagreement loss to stabilize the training process.
We further boost the expressive power of MV-GNN by proposing a cross-dependent message passing scheme.
arXiv Detail & Related papers (2020-05-17T04:46:07Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.