Extracting Molecular Properties from Natural Language with Multimodal
Contrastive Learning
- URL: http://arxiv.org/abs/2307.12996v1
- Date: Sat, 22 Jul 2023 10:32:58 GMT
- Title: Extracting Molecular Properties from Natural Language with Multimodal
Contrastive Learning
- Authors: Romain Lacombe, Andrew Gaut, Jeff He, David L\"udeke, Kateryna
Pistunova
- Abstract summary: We study how molecular property information can be transferred from natural language to graph representations.
We implement neural relevance scoring strategies to improve text retrieval, introduce a novel chemically-valid molecular graph augmentation strategy.
We achieve a +4.26% AUROC gain versus models pre-trained on the graph modality alone, and a +1.54% gain compared to recently proposed molecular graph/text contrastively trained MoMu model.
- Score: 1.3717673827807508
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Deep learning in computational biochemistry has traditionally focused on
molecular graphs neural representations; however, recent advances in language
models highlight how much scientific knowledge is encoded in text. To bridge
these two modalities, we investigate how molecular property information can be
transferred from natural language to graph representations. We study property
prediction performance gains after using contrastive learning to align neural
graph representations with representations of textual descriptions of their
characteristics. We implement neural relevance scoring strategies to improve
text retrieval, introduce a novel chemically-valid molecular graph augmentation
strategy inspired by organic reactions, and demonstrate improved performance on
downstream MoleculeNet property classification tasks. We achieve a +4.26% AUROC
gain versus models pre-trained on the graph modality alone, and a +1.54% gain
compared to recently proposed molecular graph/text contrastively trained MoMu
model (Su et al. 2022).
Related papers
- Pre-trained Molecular Language Models with Random Functional Group Masking [54.900360309677794]
We propose a SMILES-based underlineem Molecular underlineem Language underlineem Model, which randomly masking SMILES subsequences corresponding to specific molecular atoms.
This technique aims to compel the model to better infer molecular structures and properties, thus enhancing its predictive capabilities.
arXiv Detail & Related papers (2024-11-03T01:56:15Z) - Bi-level Contrastive Learning for Knowledge-Enhanced Molecule
Representations [55.42602325017405]
We propose a novel method called GODE, which takes into account the two-level structure of individual molecules.
By pre-training two graph neural networks (GNNs) on different graph structures, combined with contrastive learning, GODE fuses molecular structures with their corresponding knowledge graph substructures.
When fine-tuned across 11 chemical property tasks, our model outperforms existing benchmarks, registering an average ROC-AUC uplift of 13.8% for classification tasks and an average RMSE/MAE enhancement of 35.1% for regression tasks.
arXiv Detail & Related papers (2023-06-02T15:49:45Z) - Atomic and Subgraph-aware Bilateral Aggregation for Molecular
Representation Learning [57.670845619155195]
We introduce a new model for molecular representation learning called the Atomic and Subgraph-aware Bilateral Aggregation (ASBA)
ASBA addresses the limitations of previous atom-wise and subgraph-wise models by incorporating both types of information.
Our method offers a more comprehensive way to learn representations for molecular property prediction and has broad potential in drug and material discovery applications.
arXiv Detail & Related papers (2023-05-22T00:56:00Z) - Enhancing Model Learning and Interpretation Using Multiple Molecular
Graph Representations for Compound Property and Activity Prediction [0.0]
This research introduces multiple molecular graph representations that incorporate higher-level information.
It investigates their effects on model learning and interpretation from diverse perspectives.
The results indicate that combining atom graph representation with reduced molecular graph representation can yield promising model performance.
arXiv Detail & Related papers (2023-04-13T04:20:30Z) - A Molecular Multimodal Foundation Model Associating Molecule Graphs with
Natural Language [63.60376252491507]
We propose a molecular multimodal foundation model which is pretrained from molecular graphs and their semantically related textual data.
We believe that our model would have a broad impact on AI-empowered fields across disciplines such as biology, chemistry, materials, environment, and medicine.
arXiv Detail & Related papers (2022-09-12T00:56:57Z) - Graph neural networks for the prediction of molecular structure-property
relationships [59.11160990637615]
Graph neural networks (GNNs) are a novel machine learning method that directly work on the molecular graph.
GNNs allow to learn properties in an end-to-end fashion, thereby avoiding the need for informative descriptors.
We describe the fundamentals of GNNs and demonstrate the application of GNNs via two examples for molecular property prediction.
arXiv Detail & Related papers (2022-07-25T11:30:44Z) - Attention-wise masked graph contrastive learning for predicting
molecular property [15.387677968070912]
We proposed a self-supervised representation learning framework for large-scale unlabeled molecules.
We developed a novel molecular graph augmentation strategy, referred to as attention-wise graph mask.
Our model can capture important molecular structure and higher-order semantic information.
arXiv Detail & Related papers (2022-05-02T00:28:02Z) - Learning Attributed Graph Representations with Communicative Message
Passing Transformer [3.812358821429274]
We propose a Communicative Message Passing Transformer (CoMPT) neural network to improve the molecular graph representation.
Unlike the previous transformer-style GNNs that treat molecules as fully connected graphs, we introduce a message diffusion mechanism to leverage the graph connectivity inductive bias.
arXiv Detail & Related papers (2021-07-19T11:58:32Z) - Reinforced Molecular Optimization with Neighborhood-Controlled Grammars [63.84003497770347]
We propose MNCE-RL, a graph convolutional policy network for molecular optimization.
We extend the original neighborhood-controlled embedding grammars to make them applicable to molecular graph generation.
We show that our approach achieves state-of-the-art performance in a diverse range of molecular optimization tasks.
arXiv Detail & Related papers (2020-11-14T05:42:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.