Related papers: Molecule-Morphology Contrastive Pretraining for Transferable Molecular Representation

Molecule-Morphology Contrastive Pretraining for Transferable Molecular Representation

URL: http://arxiv.org/abs/2305.09790v2
Date: Tue, 27 Jun 2023 02:24:18 GMT
Title: Molecule-Morphology Contrastive Pretraining for Transferable Molecular Representation
Authors: Cuong Q. Nguyen, Dante Pertusi, Kim M. Branson
Abstract summary: We introduce Molecule-Morphology Contrastive Pretraining (MoCoP), a framework for learning multi-modal representation of molecular graphs and cellular morphologies. We scale MoCoP to approximately 100K molecules and 600K morphological profiles using data from the JUMP-CP Consortium. Our findings suggest that integrating cellular morphologies with molecular graphs using MoCoP can significantly improve the performance of QSAR models.
Score: 0.0
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: Image-based profiling techniques have become increasingly popular over the past decade for their applications in target identification, mechanism-of-action inference, and assay development. These techniques have generated large datasets of cellular morphologies, which are typically used to investigate the effects of small molecule perturbagens. In this work, we extend the impact of such dataset to improving quantitative structure-activity relationship (QSAR) models by introducing Molecule-Morphology Contrastive Pretraining (MoCoP), a framework for learning multi-modal representation of molecular graphs and cellular morphologies. We scale MoCoP to approximately 100K molecules and 600K morphological profiles using data from the JUMP-CP Consortium and show that MoCoP consistently improves performances of graph neural networks (GNNs) on molecular property prediction tasks in ChEMBL20 across all dataset sizes. The pretrained GNNs are also evaluated on internal GSK pharmacokinetic data and show an average improvement of 2.6% and 6.3% in AUPRC for full and low data regimes, respectively. Our findings suggest that integrating cellular morphologies with molecular graphs using MoCoP can significantly improve the performance of QSAR models, ultimately expanding the deep learning toolbox available for QSAR applications.

Related papers

Aligned Manifold Property and Topology Point Clouds for Learning Molecular Properties [55.2480439325792]
This work introduces AMPTCR, a molecular surface representation that combines local quantum-derived scalar fields and custom topological descriptors within an aligned point cloud format.<n>For molecular weight, results confirm that AMPTCR encodes physically meaningful data, with a validation R2 of 0.87.<n>In the bacterial inhibition task, AMPTCR enables both classification and direct regression of E. coli inhibition values.
arXiv Detail & Related papers (2025-07-22T04:35:50Z)
Pre-trained Molecular Language Models with Random Functional Group Masking [54.900360309677794]
We propose a SMILES-based underlineem Molecular underlineem Language underlineem Model, which randomly masking SMILES subsequences corresponding to specific molecular atoms. This technique aims to compel the model to better infer molecular structures and properties, thus enhancing its predictive capabilities.
arXiv Detail & Related papers (2024-11-03T01:56:15Z)
FARM: Functional Group-Aware Representations for Small Molecules [55.281754551202326]
We introduce Functional Group-Aware Representations for Small Molecules (FARM) FARM is a foundation model designed to bridge the gap between SMILES, natural language, and molecular graphs. We rigorously evaluate FARM on the MoleculeNet dataset, where it achieves state-of-the-art performance on 10 out of 12 tasks.
arXiv Detail & Related papers (2024-10-02T23:04:58Z)
MultiModal-Learning for Predicting Molecular Properties: A Framework Based on Image and Graph Structures [2.5563339057415218]
MolIG is a novel MultiModaL molecular pre-training framework for predicting molecular properties based on Image and Graph structures. It amalgamates the strengths of both molecular representation forms. It exhibits enhanced performance in downstream tasks pertaining to molecular property prediction within benchmark groups.
arXiv Detail & Related papers (2023-11-28T10:28:35Z)
Improving Molecular Properties Prediction Through Latent Space Fusion [9.912768918657354]
We present a multi-view approach that combines latent spaces derived from state-of-the-art chemical models. Our approach relies on two pivotal elements: the embeddings derived from MHG-GNN, which represent molecular structures as graphs, and MoLFormer embeddings rooted in chemical language. We demonstrate the superior performance of our proposed multi-view approach compared to existing state-of-the-art methods.
arXiv Detail & Related papers (2023-10-20T20:29:32Z)
Accelerating Molecular Graph Neural Networks via Knowledge Distillation [1.9116784879310031]
Recent advances in graph neural networks (GNNs) have enabled more comprehensive modeling of molecules and molecular systems. As the field has been progressing to bigger and more complex architectures, state-of-the-art GNNs have become largely prohibitive for many large-scale applications. We devise KD strategies that facilitate the distillation of hidden representations in directional and equivariant GNNs, and evaluate their performance on the regression task of energy and force prediction.
arXiv Detail & Related papers (2023-06-26T16:24:31Z)
MolCAP: Molecular Chemical reActivity pretraining and prompted-finetuning enhanced molecular representation learning [3.179128580341411]
MolCAP is a graph pretraining Transformer based on chemical reactivity (IMR) knowledge with prompted finetuning. Prompted by MolCAP, even basic graph neural networks are capable of achieving surprising performance that outperforms previous models.
arXiv Detail & Related papers (2023-06-13T13:48:06Z)
Bi-level Contrastive Learning for Knowledge-Enhanced Molecule Representations [55.42602325017405]
We propose a novel method called GODE, which takes into account the two-level structure of individual molecules. By pre-training two graph neural networks (GNNs) on different graph structures, combined with contrastive learning, GODE fuses molecular structures with their corresponding knowledge graph substructures. When fine-tuned across 11 chemical property tasks, our model outperforms existing benchmarks, registering an average ROC-AUC uplift of 13.8% for classification tasks and an average RMSE/MAE enhancement of 35.1% for regression tasks.
arXiv Detail & Related papers (2023-06-02T15:49:45Z)
MolCPT: Molecule Continuous Prompt Tuning to Generalize Molecular Representation Learning [77.31492888819935]
We propose a novel paradigm of "pre-train, prompt, fine-tune" for molecular representation learning, named molecule continuous prompt tuning (MolCPT) MolCPT defines a motif prompting function that uses the pre-trained model to project the standalone input into an expressive prompt. Experiments on several benchmark datasets show that MolCPT efficiently generalizes pre-trained GNNs for molecular property prediction.
arXiv Detail & Related papers (2022-12-20T19:32:30Z)
Few-Shot Graph Learning for Molecular Property Prediction [46.60746023179724]
We propose Meta-MGNN, a novel model for few-shot molecular property prediction. To exploit unlabeled molecular information, Meta-MGNN further incorporates molecular structure, attribute based self-supervised modules and self-attentive task weights. Extensive experiments on two public multi-property datasets demonstrate that Meta-MGNN outperforms a variety of state-of-the-art methods.
arXiv Detail & Related papers (2021-02-16T01:55:34Z)
Self-Supervised Graph Transformer on Large-Scale Molecular Data [73.3448373618865]
We propose a novel framework, GROVER, for molecular representation learning. GROVER can learn rich structural and semantic information of molecules from enormous unlabelled molecular data. We pre-train GROVER with 100 million parameters on 10 million unlabelled molecules -- the biggest GNN and the largest training dataset in molecular representation learning.
arXiv Detail & Related papers (2020-06-18T08:37:04Z)

This list is automatically generated from the titles and abstracts of the papers in this site.