Benchmarking GNN Models on Molecular Regression Tasks with CKA-Based Representation Analysis
- URL: http://arxiv.org/abs/2602.20573v1
- Date: Tue, 24 Feb 2026 05:53:24 GMT
- Title: Benchmarking GNN Models on Molecular Regression Tasks with CKA-Based Representation Analysis
- Authors: Rajan, Ishaan Gupta,
- Abstract summary: Molecules are commonly represented as SMILES strings, which can be readily converted to fixed-size molecular fingerprints.<n>These fingerprints serve as feature vectors to train ML/DL models for molecular property prediction tasks.<n>GNN learns the inherent structural relationships within a molecule rather than depending on fixed-size fingerprints.
- Score: 0.8594140167290097
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Molecules are commonly represented as SMILES strings, which can be readily converted to fixed-size molecular fingerprints. These fingerprints serve as feature vectors to train ML/DL models for molecular property prediction tasks in the field of computational chemistry, drug discovery, biochemistry, and materials science. Recent research has demonstrated that SMILES can be used to construct molecular graphs where atoms are nodes ($V$) and bonds are edges ($E$). These graphs can subsequently be used to train geometric DL models like GNN. GNN learns the inherent structural relationships within a molecule rather than depending on fixed-size fingerprints. Although GNN are powerful aggregators, their efficacy on smaller datasets and inductive biases across different architectures is less studied. In our present study, we performed a systematic benchmarking of four different GNN architectures across a diverse domain of datasets (physical chemistry, biological, and analytical). Additionally, we have also implemented a hierarchical fusion (GNN+FP) framework for target prediction. We observed that the fusion framework consistently outperforms or matches the performance of standalone GNN (RMSE improvement > $7\%$) and baseline models. Further, we investigated the representational similarity using centered kernel alignment (CKA) between GNN and fingerprint embeddings and found that they occupy highly independent latent spaces (CKA $\le0.46$). The cross-architectural CKA score suggests a high convergence between isotopic models like GCN, GraphSAGE and GIN (CKA $\geq0.88$), with GAT learning moderately independent representation (CKA $0.55-0.80$).
Related papers
- Geometric Multi-color Message Passing Graph Neural Networks for Blood-brain Barrier Permeability Prediction [1.488392495573075]
This paper introduces the geometric multi-color message-passing graph neural network (GMC-MPNN)<n>Our model constructs weighted colored subgraphs based on atom types to capture the spatial relationships and chemical context that govern blood-brain barrier permeability.
arXiv Detail & Related papers (2025-07-25T03:38:46Z) - Generalization of Graph Neural Networks is Robust to Model Mismatch [84.01980526069075]
Graph neural networks (GNNs) have demonstrated their effectiveness in various tasks supported by their generalization capabilities.
In this paper, we examine GNNs that operate on geometric graphs generated from manifold models.
Our analysis reveals the robustness of the GNN generalization in the presence of such model mismatch.
arXiv Detail & Related papers (2024-08-25T16:00:44Z) - Molecular Hypergraph Neural Networks [1.4559839293730863]
Graph neural networks (GNNs) have demonstrated promising performance across various chemistry-related tasks.
We introduce molecular hypergraphs and propose Molecular Hypergraph Neural Networks (MHNN) to predict the optoelectronic properties of organic semiconductors.
MHNN outperforms all baseline models on most tasks of OPV, OCELOTv1 and PCQM4Mv2 datasets.
arXiv Detail & Related papers (2023-12-20T15:56:40Z) - Bi-level Contrastive Learning for Knowledge-Enhanced Molecule Representations [68.32093648671496]
We introduce GODE, which accounts for the dual-level structure inherent in molecules.<n> Molecules possess an intrinsic graph structure and simultaneously function as nodes within a broader molecular knowledge graph.<n>By pre-training two GNNs on different graph structures, GODE effectively fuses molecular structures with their corresponding knowledge graph substructures.
arXiv Detail & Related papers (2023-06-02T15:49:45Z) - GCNH: A Simple Method For Representation Learning On Heterophilous
Graphs [4.051099980410583]
Graph Neural Networks (GNNs) are well-suited for learning on homophilous graphs.
Recent works have proposed extensions to standard GNN architectures to improve performance on heterophilous graphs.
We propose GCN for Heterophily (GCNH), a simple yet effective GNN architecture applicable to both heterophilous and homophilous scenarios.
arXiv Detail & Related papers (2023-04-21T11:26:24Z) - MolGraph: a Python package for the implementation of molecular graphs
and graph neural networks with TensorFlow and Keras [51.92255321684027]
MolGraph is a graph neural network (GNN) package for molecular machine learning (ML)
MolGraph implements a chemistry module to accommodate the generation of small molecular graphs, which can be passed to a GNN algorithm to solve a molecular ML problem.
GNNs proved useful for molecular identification and improved interpretability of chromatographic retention time data.
arXiv Detail & Related papers (2022-08-21T18:37:41Z) - Graph neural networks for the prediction of molecular structure-property
relationships [59.11160990637615]
Graph neural networks (GNNs) are a novel machine learning method that directly work on the molecular graph.
GNNs allow to learn properties in an end-to-end fashion, thereby avoiding the need for informative descriptors.
We describe the fundamentals of GNNs and demonstrate the application of GNNs via two examples for molecular property prediction.
arXiv Detail & Related papers (2022-07-25T11:30:44Z) - MGNN: Graph Neural Networks Inspired by Distance Geometry Problem [28.789684784093048]
Graph Neural Networks (GNNs) have emerged as a prominent research topic in the field of machine learning.
In this paper, we propose a GNN model inspired by the congruent-inphilic property of the classifiers in the classification phase of GNNs.
We extensively evaluate the effectiveness of our model through experiments conducted on both synthetic and real-world datasets.
arXiv Detail & Related papers (2022-01-31T04:15:42Z) - $p$-Laplacian Based Graph Neural Networks [27.747195341003263]
Graph networks (GNNs) have demonstrated superior performance for semi-supervised node classification on graphs.
We propose a new $p$-Laplacian based GNN model, termed as $p$GNN, whose message passing mechanism is derived from a discrete regularization framework.
We show that the new message passing mechanism works simultaneously as low-pass and high-pass filters, thus making $p$GNNs effective on both homophilic and heterophilic graphs.
arXiv Detail & Related papers (2021-11-14T13:16:28Z) - Eigen-GNN: A Graph Structure Preserving Plug-in for GNNs [95.63153473559865]
Graph Neural Networks (GNNs) are emerging machine learning models on graphs.
Most existing GNN models in practice are shallow and essentially feature-centric.
We show empirically and analytically that the existing shallow GNNs cannot preserve graph structures well.
We propose Eigen-GNN, a plug-in module to boost GNNs ability in preserving graph structures.
arXiv Detail & Related papers (2020-06-08T02:47:38Z) - Multi-View Graph Neural Networks for Molecular Property Prediction [67.54644592806876]
We present Multi-View Graph Neural Network (MV-GNN), a multi-view message passing architecture.
In MV-GNN, we introduce a shared self-attentive readout component and disagreement loss to stabilize the training process.
We further boost the expressive power of MV-GNN by proposing a cross-dependent message passing scheme.
arXiv Detail & Related papers (2020-05-17T04:46:07Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.