Pre-training Graph Neural Networks on 2D and 3D Molecular Structures by using Multi-View Conditional Information Bottleneck
- URL: http://arxiv.org/abs/2511.18404v1
- Date: Sun, 23 Nov 2025 11:18:35 GMT
- Title: Pre-training Graph Neural Networks on 2D and 3D Molecular Structures by using Multi-View Conditional Information Bottleneck
- Authors: Van Thuy Hoang, O-Joun Lee,
- Abstract summary: We propose a Multi-View Conditional Information Bottleneck framework for pre-training graph neural networks on 2D and 3D molecular structures.<n>Our idea is to discover the shared information while minimizing irrelevant features from each view under the MVCIB principle.<n>To enhance semantic and structural consistency across views, we utilize key substructures, e.g., functional groups and ego-networks, as anchors between the two views.
- Score: 8.42839603549236
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Recent pre-training strategies for molecular graphs have attempted to use 2D and 3D molecular views as both inputs and self-supervised signals, primarily aligning graph-level representations. However, existing studies remain limited in addressing two main challenges of multi-view molecular learning: (1) discovering shared information between two views while diminishing view-specific information and (2) identifying and aligning important substructures, e.g., functional groups, which are crucial for enhancing cross-view consistency and model expressiveness. To solve these challenges, we propose a Multi-View Conditional Information Bottleneck framework, called MVCIB, for pre-training graph neural networks on 2D and 3D molecular structures in a self-supervised setting. Our idea is to discover the shared information while minimizing irrelevant features from each view under the MVCIB principle, which uses one view as a contextual condition to guide the representation learning of its counterpart. To enhance semantic and structural consistency across views, we utilize key substructures, e.g., functional groups and ego-networks, as anchors between the two views. Then, we propose a cross-attention mechanism that captures fine-grained correlations between the substructures to achieve subgraph alignment across views. Extensive experiments in four molecular domains demonstrated that MVCIB consistently outperforms baselines in both predictive performance and interpretability. Moreover, MVCIB achieved the 3d Weisfeiler-Lehman expressiveness power to distinguish not only non-isomorphic graphs but also different 3D geometries that share identical 2D connectivity, such as isomers.
Related papers
- UniG2U-Bench: Do Unified Models Advance Multimodal Understanding? [50.92401586025528]
Unified multimodal models have recently demonstrated strong generative capabilities, yet whether and when generation improves understanding remains unclear.<n>We introduce UniG2U-Bench, a comprehensive benchmark categorizing generation-to-understanding (G2U) evaluation into 7 regimes and 30 subtasks.
arXiv Detail & Related papers (2026-03-03T18:36:16Z) - CVP: Central-Peripheral Vision-Inspired Multimodal Model for Spatial Reasoning [48.36177110428022]
We present a central-peripheral vision-inspired framework (CVP) for spatial reasoning.<n>CVP draws inspiration from the two types of human visual fields -- central vision and peripheral vision.<n> Experiments show that CVP achieves state-of-the-art performance across a range of 3D scene understanding benchmarks.
arXiv Detail & Related papers (2025-12-09T00:21:13Z) - Learning the Neighborhood: Contrast-Free Multimodal Self-Supervised Molecular Graph Pretraining [21.71848826907517]
We introduce C-FREE (Contrast-Free Representation learning on Ego-nets), a simple framework that integrates 2D graphs with ensembles of 3D conformers.<n>C-FREE learns molecular representations by predicting subgraph embeddings from their complementary neighborhoods in the latent space.<n>C-FREE state-of-the-art results on MoleculeNet, surpassing contrastive, generative, and other multimodal self-supervised methods.
arXiv Detail & Related papers (2025-09-26T15:16:20Z) - Duplex: Dual Prototype Learning for Compositional Zero-Shot Learning [17.013498508426398]
Compositional Zero-Shot Learning (CZSL) aims to enable models to recognize novel compositions of visual states and objects that were absent during training.<n>We propose Duplex, a novel dual-prototype learning method that integrates semantic and visual prototypes through a carefully designed dual-branch architecture.
arXiv Detail & Related papers (2025-01-13T08:04:32Z) - Unified Molecular Modeling via Modality Blending [35.16755562674055]
We introduce a novel "blend-then-predict" self-supervised learning method (MoleBLEND)
MoleBLEND blends atom relations from different modalities into one unified relation for matrix encoding, then recovers modality-specific information for both 2D and 3D structures.
Experiments show that MoleBLEND achieves state-of-the-art performance across major 2D/3D benchmarks.
arXiv Detail & Related papers (2023-07-12T15:27:06Z) - Hierarchical Contrastive Learning Enhanced Heterogeneous Graph Neural
Network [59.860534520941485]
Heterogeneous graph neural networks (HGNNs) as an emerging technique have shown superior capacity of dealing with heterogeneous information network (HIN)
Recently, contrastive learning, a self-supervised method, becomes one of the most exciting learning paradigms and shows great potential when there are no labels.
In this paper, we study the problem of self-supervised HGNNs and propose a novel co-contrastive learning mechanism for HGNNs, named HeCo.
arXiv Detail & Related papers (2023-04-24T16:17:21Z) - Deep Image Clustering with Contrastive Learning and Multi-scale Graph
Convolutional Networks [58.868899595936476]
This paper presents a new deep clustering approach termed image clustering with contrastive learning and multi-scale graph convolutional networks (IcicleGCN)
Experiments on multiple image datasets demonstrate the superior clustering performance of IcicleGCN over the state-of-the-art.
arXiv Detail & Related papers (2022-07-14T19:16:56Z) - MGA-VQA: Multi-Granularity Alignment for Visual Question Answering [75.55108621064726]
Learning to answer visual questions is a challenging task since the multi-modal inputs are within two feature spaces.
We propose Multi-Granularity Alignment architecture for Visual Question Answering task (MGA-VQA)
Our model splits alignment into different levels to achieve learning better correlations without needing additional data and annotations.
arXiv Detail & Related papers (2022-01-25T22:30:54Z) - Deep Contrastive Learning for Multi-View Network Embedding [20.035449838566503]
Multi-view network embedding aims at projecting nodes in the network to low-dimensional vectors.
Most contrastive learning-based methods mostly rely on high-quality graph embedding.
We design a novel node-to-node Contrastive learning framework for Multi-view network Embedding (CREME)
arXiv Detail & Related papers (2021-08-16T06:29:18Z) - Spatial-Temporal Correlation and Topology Learning for Person
Re-Identification in Videos [78.45050529204701]
We propose a novel framework to pursue discriminative and robust representation by modeling cross-scale spatial-temporal correlation.
CTL utilizes a CNN backbone and a key-points estimator to extract semantic local features from human body.
It explores a context-reinforced topology to construct multi-scale graphs by considering both global contextual information and physical connections of human body.
arXiv Detail & Related papers (2021-04-15T14:32:12Z) - Graph Information Bottleneck [77.21967740646784]
Graph Neural Networks (GNNs) provide an expressive way to fuse information from network structure and node features.
Inheriting from the general Information Bottleneck (IB), GIB aims to learn the minimal sufficient representation for a given task.
We show that our proposed models are more robust than state-of-the-art graph defense models.
arXiv Detail & Related papers (2020-10-24T07:13:00Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.