Extracting Molecular Properties from Natural Language with Multimodal
Contrastive Learning
- URL: http://arxiv.org/abs/2307.12996v1
- Date: Sat, 22 Jul 2023 10:32:58 GMT
- Title: Extracting Molecular Properties from Natural Language with Multimodal
Contrastive Learning
- Authors: Romain Lacombe, Andrew Gaut, Jeff He, David L\"udeke, Kateryna
Pistunova
- Abstract summary: We study how molecular property information can be transferred from natural language to graph representations.
We implement neural relevance scoring strategies to improve text retrieval, introduce a novel chemically-valid molecular graph augmentation strategy.
We achieve a +4.26% AUROC gain versus models pre-trained on the graph modality alone, and a +1.54% gain compared to recently proposed molecular graph/text contrastively trained MoMu model.
- Score: 1.3717673827807508
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Deep learning in computational biochemistry has traditionally focused on
molecular graphs neural representations; however, recent advances in language
models highlight how much scientific knowledge is encoded in text. To bridge
these two modalities, we investigate how molecular property information can be
transferred from natural language to graph representations. We study property
prediction performance gains after using contrastive learning to align neural
graph representations with representations of textual descriptions of their
characteristics. We implement neural relevance scoring strategies to improve
text retrieval, introduce a novel chemically-valid molecular graph augmentation
strategy inspired by organic reactions, and demonstrate improved performance on
downstream MoleculeNet property classification tasks. We achieve a +4.26% AUROC
gain versus models pre-trained on the graph modality alone, and a +1.54% gain
compared to recently proposed molecular graph/text contrastively trained MoMu
model (Su et al. 2022).
Related papers
- Knowledge-aware contrastive heterogeneous molecular graph learning [77.94721384862699]
We propose a paradigm shift by encoding molecular graphs into Heterogeneous Molecular Graph Learning (KCHML)
KCHML conceptualizes molecules through three distinct graph views-molecular, elemental, and pharmacological-enhanced by heterogeneous molecular graphs and a dual message-passing mechanism.
This design offers a comprehensive representation for property prediction, as well as for downstream tasks such as drug-drug interaction (DDI) prediction.
arXiv Detail & Related papers (2025-02-17T11:53:58Z) - Pre-trained Molecular Language Models with Random Functional Group Masking [54.900360309677794]
We propose a SMILES-based underlineem Molecular underlineem Language underlineem Model, which randomly masking SMILES subsequences corresponding to specific molecular atoms.
This technique aims to compel the model to better infer molecular structures and properties, thus enhancing its predictive capabilities.
arXiv Detail & Related papers (2024-11-03T01:56:15Z) - Bi-level Contrastive Learning for Knowledge-Enhanced Molecule Representations [68.32093648671496]
We introduce GODE, which accounts for the dual-level structure inherent in molecules.
Molecules possess an intrinsic graph structure and simultaneously function as nodes within a broader molecular knowledge graph.
By pre-training two GNNs on different graph structures, GODE effectively fuses molecular structures with their corresponding knowledge graph substructures.
arXiv Detail & Related papers (2023-06-02T15:49:45Z) - Enhancing Model Learning and Interpretation Using Multiple Molecular
Graph Representations for Compound Property and Activity Prediction [0.0]
This research introduces multiple molecular graph representations that incorporate higher-level information.
It investigates their effects on model learning and interpretation from diverse perspectives.
The results indicate that combining atom graph representation with reduced molecular graph representation can yield promising model performance.
arXiv Detail & Related papers (2023-04-13T04:20:30Z) - A Molecular Multimodal Foundation Model Associating Molecule Graphs with
Natural Language [63.60376252491507]
We propose a molecular multimodal foundation model which is pretrained from molecular graphs and their semantically related textual data.
We believe that our model would have a broad impact on AI-empowered fields across disciplines such as biology, chemistry, materials, environment, and medicine.
arXiv Detail & Related papers (2022-09-12T00:56:57Z) - Graph neural networks for the prediction of molecular structure-property
relationships [59.11160990637615]
Graph neural networks (GNNs) are a novel machine learning method that directly work on the molecular graph.
GNNs allow to learn properties in an end-to-end fashion, thereby avoiding the need for informative descriptors.
We describe the fundamentals of GNNs and demonstrate the application of GNNs via two examples for molecular property prediction.
arXiv Detail & Related papers (2022-07-25T11:30:44Z) - Attention-wise masked graph contrastive learning for predicting
molecular property [15.387677968070912]
We proposed a self-supervised representation learning framework for large-scale unlabeled molecules.
We developed a novel molecular graph augmentation strategy, referred to as attention-wise graph mask.
Our model can capture important molecular structure and higher-order semantic information.
arXiv Detail & Related papers (2022-05-02T00:28:02Z) - Learning Attributed Graph Representations with Communicative Message
Passing Transformer [3.812358821429274]
We propose a Communicative Message Passing Transformer (CoMPT) neural network to improve the molecular graph representation.
Unlike the previous transformer-style GNNs that treat molecules as fully connected graphs, we introduce a message diffusion mechanism to leverage the graph connectivity inductive bias.
arXiv Detail & Related papers (2021-07-19T11:58:32Z) - Reinforced Molecular Optimization with Neighborhood-Controlled Grammars [63.84003497770347]
We propose MNCE-RL, a graph convolutional policy network for molecular optimization.
We extend the original neighborhood-controlled embedding grammars to make them applicable to molecular graph generation.
We show that our approach achieves state-of-the-art performance in a diverse range of molecular optimization tasks.
arXiv Detail & Related papers (2020-11-14T05:42:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.