GeoRecon: Graph-Level Representation Learning for 3D Molecules via Reconstruction-Based Pretraining
- URL: http://arxiv.org/abs/2506.13174v1
- Date: Mon, 16 Jun 2025 07:35:49 GMT
- Title: GeoRecon: Graph-Level Representation Learning for 3D Molecules via Reconstruction-Based Pretraining
- Authors: Shaoheng Yan, Zian Li, Muhan Zhang,
- Abstract summary: We present GeoRecon, a graph-level pretraining framework for molecular representation learning.<n>During pretraining, the model is trained to generate a graph representation capable of accurately guiding reconstruction of the molecular geometry.<n>GeoRecon outperforms node-centric baselines on multiple molecular benchmarks.
- Score: 19.398985037052224
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The pretraining-and-finetuning paradigm has driven significant advances across domains, such as natural language processing and computer vision, with representative pretraining paradigms such as masked language modeling and next-token prediction. However, in molecular representation learning, the task design remains largely limited to node-level denoising, which is effective at modeling local atomic environments, yet maybe insufficient for capturing the global molecular structure required by graph-level property prediction tasks, such as energy estimation and molecular regression. In this work, we present GeoRecon, a novel graph-level pretraining framework that shifts the focus from individual atoms to the molecule as an integrated whole. GeoRecon introduces a graph-level reconstruction task: during pretraining, the model is trained to generate an informative graph representation capable of accurately guiding reconstruction of the molecular geometry. This encourages the model to learn coherent, global structural features rather than isolated atomic details. Without relying on additional supervision or external data, GeoRecon outperforms node-centric baselines on multiple molecular benchmarks (e.g., QM9, MD17), demonstrating the benefit of incorporating graph-level reconstruction for learning more holistic and geometry-aware molecular embeddings.
Related papers
- GeoGNN: Quantifying and Mitigating Semantic Drift in Text-Attributed Graphs [59.61242815508687]
Graph neural networks (GNNs) on text--attributed graphs (TAGs) encode node texts using pretrained language models (PLMs) and propagate these embeddings through linear neighborhood aggregation.<n>This work introduces a local PCA-based metric that measures the degree of semantic drift and provides the first quantitative framework to analyze how different aggregation mechanisms affect manifold structure.
arXiv Detail & Related papers (2025-11-12T06:48:43Z) - GeoGraph: Geometric and Graph-based Ensemble Descriptors for Intrinsically Disordered Proteins [0.43981305860983716]
We introduce GeoGraph, a simulation-informed surrogate trained to predict ensemble-averaged statistics of residue-residue contact-map topology directly from sequence.<n>By featurizing coarse-grained molecular dynamics simulations into residue- and sequence-level graph descriptors, we create a robust and information-rich learning target.
arXiv Detail & Related papers (2025-10-01T11:13:53Z) - Learning the Neighborhood: Contrast-Free Multimodal Self-Supervised Molecular Graph Pretraining [21.71848826907517]
We introduce C-FREE (Contrast-Free Representation learning on Ego-nets), a simple framework that integrates 2D graphs with ensembles of 3D conformers.<n>C-FREE learns molecular representations by predicting subgraph embeddings from their complementary neighborhoods in the latent space.<n>C-FREE state-of-the-art results on MoleculeNet, surpassing contrastive, generative, and other multimodal self-supervised methods.
arXiv Detail & Related papers (2025-09-26T15:16:20Z) - A Remedy for Over-Squashing in Graph Learning via Forman-Ricci Curvature based Graph-to-Hypergraph Structural Lifting [0.0]
We propose a structural lifting strategy using Forman-Ricci curvature, which defines an edge-based network characteristic.<n>Curvature reveals local and global properties of a graph, such as a network's backbones.<n>Our approach provides a remedy to the problem of information distortion in message passing across long distances and graph bottlenecks.
arXiv Detail & Related papers (2025-08-15T10:46:27Z) - Pre-trained Molecular Language Models with Random Functional Group Masking [54.900360309677794]
We propose a SMILES-based underlineem Molecular underlineem Language underlineem Model, which randomly masking SMILES subsequences corresponding to specific molecular atoms.
This technique aims to compel the model to better infer molecular structures and properties, thus enhancing its predictive capabilities.
arXiv Detail & Related papers (2024-11-03T01:56:15Z) - Pushing the Limits of All-Atom Geometric Graph Neural Networks: Pre-Training, Scaling and Zero-Shot Transfer [15.302727191576784]
Geometric graph neural networks (Geom-GNNs) with all-atom information have transformed atomistic simulations.
We study the scaling behaviors of Geom-GNNs under self-supervised pre-training, supervised and unsupervised learning setups.
We show how all-atom graph embedding can be organically combined with other neural architectures to enhance the expressive power.
arXiv Detail & Related papers (2024-10-29T03:07:33Z) - Using pretrained graph neural networks with token mixers as geometric featurizers for conformational dynamics [0.0]
We introduce geom2vec, in which pretrained graph neural networks (GNNs) are used as universal geometric featurizers.<n>We show how the learned GNN representations can capture interpretable relationships between structural units (tokens) by combining them with expressive token mixers.
arXiv Detail & Related papers (2024-09-30T00:36:06Z) - Hi-GMAE: Hierarchical Graph Masked Autoencoders [90.30572554544385]
Hierarchical Graph Masked AutoEncoders (Hi-GMAE)
Hi-GMAE is a novel multi-scale GMAE framework designed to handle the hierarchical structures within graphs.
Our experiments on 15 graph datasets consistently demonstrate that Hi-GMAE outperforms 17 state-of-the-art self-supervised competitors.
arXiv Detail & Related papers (2024-05-17T09:08:37Z) - CTAGE: Curvature-Based Topology-Aware Graph Embedding for Learning
Molecular Representations [11.12640831521393]
We propose an embedding approach CTAGE, utilizing $k$-hop discrete Ricci curvature to extract structural insights from molecular graph data.
Results indicate that introducing node curvature significantly improves the performance of current graph neural network frameworks.
arXiv Detail & Related papers (2023-07-25T06:13:01Z) - Bi-level Contrastive Learning for Knowledge-Enhanced Molecule Representations [68.32093648671496]
We introduce GODE, which accounts for the dual-level structure inherent in molecules.<n> Molecules possess an intrinsic graph structure and simultaneously function as nodes within a broader molecular knowledge graph.<n>By pre-training two GNNs on different graph structures, GODE effectively fuses molecular structures with their corresponding knowledge graph substructures.
arXiv Detail & Related papers (2023-06-02T15:49:45Z) - Molecular Graph Generation via Geometric Scattering [7.796917261490019]
Graph neural networks (GNNs) have been used extensively for addressing problems in drug design and discovery.
We propose a representation-first approach to molecular graph generation.
We show that our architecture learns meaningful representations of drug datasets and provides a platform for goal-directed drug synthesis.
arXiv Detail & Related papers (2021-10-12T18:00:23Z) - GeomGCL: Geometric Graph Contrastive Learning for Molecular Property
Prediction [47.70253904390288]
We propose a novel graph contrastive learning method utilizing the geometry of a molecule across 2D and 3D views.
Specifically, we first devise a dual-view geometric message passing network (GeomMPNN) to adaptively leverage the rich information of both 2D and 3D graphs of a molecule.
arXiv Detail & Related papers (2021-09-24T03:55:27Z) - GeoMol: Torsional Geometric Generation of Molecular 3D Conformer
Ensembles [60.12186997181117]
Prediction of a molecule's 3D conformer ensemble from the molecular graph holds a key role in areas of cheminformatics and drug discovery.
Existing generative models have several drawbacks including lack of modeling important molecular geometry elements.
We propose GeoMol, an end-to-end, non-autoregressive and SE(3)-invariant machine learning approach to generate 3D conformers.
arXiv Detail & Related papers (2021-06-08T14:17:59Z) - Self-supervised Graph-level Representation Learning with Local and
Global Structure [71.45196938842608]
We propose a unified framework called Local-instance and Global-semantic Learning (GraphLoG) for self-supervised whole-graph representation learning.
Besides preserving the local similarities, GraphLoG introduces the hierarchical prototypes to capture the global semantic clusters.
An efficient online expectation-maximization (EM) algorithm is further developed for learning the model.
arXiv Detail & Related papers (2021-06-08T05:25:38Z) - Uncovering the Folding Landscape of RNA Secondary Structure with Deep
Graph Embeddings [71.20283285671461]
We propose a geometric scattering autoencoder (GSAE) network for learning such graph embeddings.
Our embedding network first extracts rich graph features using the recently proposed geometric scattering transform.
We show that GSAE organizes RNA graphs both by structure and energy, accurately reflecting bistable RNA structures.
arXiv Detail & Related papers (2020-06-12T00:17:59Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.