UniMAP: Universal SMILES-Graph Representation Learning
- URL: http://arxiv.org/abs/2310.14216v2
- Date: Mon, 04 Nov 2024 13:33:28 GMT
- Title: UniMAP: Universal SMILES-Graph Representation Learning
- Authors: Shikun Feng, Lixin Yang, Yanwen Huang, Yuyan Ni, Weiying Ma, Yanyan Lan,
- Abstract summary: We propose a universal SMILE-graph representation learning model, namely UniMAP.
Four kinds of pre-training tasks are designed for UniMAP, including Multi-Level Cross-Modality Masking (CMM), SMILES-Graph Matching (SGM), Fragment-Level Alignment (FLA), and Domain Knowledge Learning (DKL)
Experimental results show that UniMAP outperforms current state-of-the-art pre-training methods.
- Score: 21.25038529787392
- License:
- Abstract: Molecular representation learning is fundamental for many drug related applications. Most existing molecular pre-training models are limited in using single molecular modality, either SMILES or graph representation. To effectively leverage both modalities, we argue that it is critical to capture the fine-grained 'semantics' between SMILES and graph, because subtle sequence/graph differences may lead to contrary molecular properties. In this paper, we propose a universal SMILE-graph representation learning model, namely UniMAP. Firstly, an embedding layer is employed to obtain the token and node/edge representation in SMILES and graph, respectively. A multi-layer Transformer is then utilized to conduct deep cross-modality fusion. Specially, four kinds of pre-training tasks are designed for UniMAP, including Multi-Level Cross-Modality Masking (CMM), SMILES-Graph Matching (SGM), Fragment-Level Alignment (FLA), and Domain Knowledge Learning (DKL). In this way, both global (i.e. SGM and DKL) and local (i.e. CMM and FLA) alignments are integrated to achieve comprehensive cross-modality fusion. We evaluate UniMAP on various downstream tasks, i.e. molecular property prediction, drug-target affinity prediction and drug-drug interaction. Experimental results show that UniMAP outperforms current state-of-the-art pre-training methods.We also visualize the learned representations to demonstrate the effect of multi-modality integration.
Related papers
- UniMatch: Universal Matching from Atom to Task for Few-Shot Drug Discovery [24.39705006290841]
We introduce Universal Matching Networks (UniMatch), a dual matching framework that integrates explicit hierarchical molecular matching with implicit task-level matching.
Specifically, our approach captures structural features across multiple levels, such as atoms, substructures, and molecules, via hierarchical pooling and matching.
Our experimental results demonstrate that UniMatch outperforms state-of-the-art methods on the MoleculeNet and FS-Mol benchmarks.
arXiv Detail & Related papers (2025-02-18T02:36:03Z) - MolGraph-xLSTM: A graph-based dual-level xLSTM framework with multi-head mixture-of-experts for enhanced molecular representation and interpretability [9.858315463084084]
MolGraph-xLSTM is a graph-based xLSTM model that enhances feature extraction and effectively models molecule long-range interactions.
Our approach processes molecular graphs at two scales: atom-level and motif-level.
We validate MolGraph-xLSTM on 10 molecular property prediction datasets, covering both classification and regression tasks.
arXiv Detail & Related papers (2025-01-30T15:47:59Z) - Dual-Modality Representation Learning for Molecular Property Prediction [3.0953718537420545]
Accurate prediction of drug properties relies heavily on effective molecular representations.
Recent advances in learning drug properties commonly employ Graph Neural Networks (GNNs) based on the graph representation.
We propose a method named Dual-Modality Cross-Attention (DMCA) that can effectively combine the strengths of two representations.
arXiv Detail & Related papers (2025-01-11T18:15:37Z) - Exploring Hierarchical Molecular Graph Representation in Multimodal LLMs [6.770274624885797]
We study the effect of various graph feature levels on model performance.
We conclude with two key insights: (1) current molecular-related multimodal LLMs lack a comprehensive understanding of graph features, and (2) static processing is not sufficient for hierarchical graph feature.
arXiv Detail & Related papers (2024-11-07T13:45:26Z) - MOCA: Self-supervised Representation Learning by Predicting Masked Online Codebook Assignments [72.6405488990753]
Self-supervised learning can be used for mitigating the greedy needs of Vision Transformer networks.
We propose a single-stage and standalone method, MOCA, which unifies both desired properties.
We achieve new state-of-the-art results on low-shot settings and strong experimental results in various evaluation protocols.
arXiv Detail & Related papers (2023-07-18T15:46:20Z) - Bi-level Contrastive Learning for Knowledge-Enhanced Molecule Representations [68.32093648671496]
We introduce GODE, which accounts for the dual-level structure inherent in molecules.
Molecules possess an intrinsic graph structure and simultaneously function as nodes within a broader molecular knowledge graph.
By pre-training two GNNs on different graph structures, GODE effectively fuses molecular structures with their corresponding knowledge graph substructures.
arXiv Detail & Related papers (2023-06-02T15:49:45Z) - Atomic and Subgraph-aware Bilateral Aggregation for Molecular
Representation Learning [57.670845619155195]
We introduce a new model for molecular representation learning called the Atomic and Subgraph-aware Bilateral Aggregation (ASBA)
ASBA addresses the limitations of previous atom-wise and subgraph-wise models by incorporating both types of information.
Our method offers a more comprehensive way to learn representations for molecular property prediction and has broad potential in drug and material discovery applications.
arXiv Detail & Related papers (2023-05-22T00:56:00Z) - Molecular Joint Representation Learning via Multi-modal Information [11.493011069441188]
We propose a novel framework of molecular joint representation learning via Multi-Modal information of SMILES and molecular Graphs, called MMSG.
We improve the self-attention mechanism by introducing bond level graph representation as attention bias in Transformer.
We further propose a Bidirectional Message Communication Graph Neural Network (BMC GNN) to strengthen the information flow aggregated from graphs for further combination.
arXiv Detail & Related papers (2022-11-25T11:53:23Z) - Learning with MISELBO: The Mixture Cookbook [62.75516608080322]
We present the first ever mixture of variational approximations for a normalizing flow-based hierarchical variational autoencoder (VAE) with VampPrior and a PixelCNN decoder network.
We explain this cooperative behavior by drawing a novel connection between VI and adaptive importance sampling.
We obtain state-of-the-art results among VAE architectures in terms of negative log-likelihood on the MNIST and FashionMNIST datasets.
arXiv Detail & Related papers (2022-09-30T15:01:35Z) - Dual-view Molecule Pre-training [186.07333992384287]
Dual-view molecule pre-training can effectively combine the strengths of both types of molecule representations.
DMP is tested on nine molecular property prediction tasks and achieves state-of-the-art performances on seven of them.
arXiv Detail & Related papers (2021-06-17T03:58:38Z) - ASGN: An Active Semi-supervised Graph Neural Network for Molecular
Property Prediction [61.33144688400446]
We propose a novel framework called Active Semi-supervised Graph Neural Network (ASGN) by incorporating both labeled and unlabeled molecules.
In the teacher model, we propose a novel semi-supervised learning method to learn general representation that jointly exploits information from molecular structure and molecular distribution.
At last, we proposed a novel active learning strategy in terms of molecular diversities to select informative data during the whole framework learning.
arXiv Detail & Related papers (2020-07-07T04:22:39Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.