Related papers: MolKD: Distilling Cross-Modal Knowledge in Chemical Reactions for Molecular Property Prediction

MolKD: Distilling Cross-Modal Knowledge in Chemical Reactions for Molecular Property Prediction

URL: http://arxiv.org/abs/2305.01912v1
Date: Wed, 3 May 2023 06:01:03 GMT
Title: MolKD: Distilling Cross-Modal Knowledge in Chemical Reactions for Molecular Property Prediction
Authors: Liang Zeng, Lanqing Li, Jian Li
Abstract summary: How to effectively represent molecules is a long-standing challenge for molecular property prediction and drug discovery. This paper proposes to incorporate chemical domain knowledge, specifically related to chemical reactions, for learning effective molecular representations. We introduce a novel method, namely MolKD, which Distills cross-modal Knowledge in chemical reactions to assist Molecular property prediction.
Score: 9.100067773907403
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: How to effectively represent molecules is a long-standing challenge for molecular property prediction and drug discovery. This paper studies this problem and proposes to incorporate chemical domain knowledge, specifically related to chemical reactions, for learning effective molecular representations. However, the inherent cross-modality property between chemical reactions and molecules presents a significant challenge to address. To this end, we introduce a novel method, namely MolKD, which Distills cross-modal Knowledge in chemical reactions to assist Molecular property prediction. Specifically, the reaction-to-molecule distillation model within MolKD transfers cross-modal knowledge from a pre-trained teacher network learning with one modality (i.e., reactions) into a student network learning with another modality (i.e., molecules). Moreover, MolKD learns effective molecular representations by incorporating reaction yields to measure transformation efficiency of the reactant-product pair when pre-training on reactions. Extensive experiments demonstrate that MolKD significantly outperforms various competitive baseline models, e.g., 2.1% absolute AUC-ROC gain on Tox21. Further investigations demonstrate that pre-trained molecular representations in MolKD can distinguish chemically reasonable molecular similarities, which enables molecular property prediction with high robustness and interpretability.

Related papers

Mol-LLaMA: Towards General Understanding of Molecules in Large Molecular Language Model [55.87790704067848]
Mol-LLaMA is a large molecular language model that grasps the general knowledge centered on molecules. We introduce a module that integrates complementary information from different molecular encoders. Our experimental results demonstrate that Mol-LLaMA is capable of comprehending the general features of molecules.
arXiv Detail & Related papers (2025-02-19T05:49:10Z)
Knowledge-aware contrastive heterogeneous molecular graph learning [77.94721384862699]
We propose a paradigm shift by encoding molecular graphs into Heterogeneous Molecular Graph Learning (KCHML) KCHML conceptualizes molecules through three distinct graph views-molecular, elemental, and pharmacological-enhanced by heterogeneous molecular graphs and a dual message-passing mechanism. This design offers a comprehensive representation for property prediction, as well as for downstream tasks such as drug-drug interaction (DDI) prediction.
arXiv Detail & Related papers (2025-02-17T11:53:58Z)
FARM: Functional Group-Aware Representations for Small Molecules [55.281754551202326]
We introduce Functional Group-Aware Representations for Small Molecules (FARM) FARM is a foundation model designed to bridge the gap between SMILES, natural language, and molecular graphs. We rigorously evaluate FARM on the MoleculeNet dataset, where it achieves state-of-the-art performance on 10 out of 12 tasks.
arXiv Detail & Related papers (2024-10-02T23:04:58Z)
How Molecules Impact Cells: Unlocking Contrastive PhenoMolecular Retrieval [6.77417215041515]
We learn a joint latent space between molecular structures and microscopy phenomic experiments, aligning paired samples with contrastive learning. We demonstrate an 8.1x improvement in zero shot molecular retrieval of active molecules over the previous state-of-the-art, reaching 77.33% in top-1% accuracy. These results open the door for machine learning to be applied in virtual phenomics screening, which can significantly benefit drug discovery applications.
arXiv Detail & Related papers (2024-09-10T18:16:27Z)
Atom-Motif Contrastive Transformer for Molecular Property Prediction [68.85399466928976]
Graph Transformer (GT) models have been widely used in the task of Molecular Property Prediction (MPP) We propose a novel Atom-Motif Contrastive Transformer (AMCT) which explores atom-level interactions and considers motif-level interactions. Our proposed AMCT is extensively evaluated on seven popular benchmark datasets, and both quantitative and qualitative results firmly demonstrate its effectiveness.
arXiv Detail & Related papers (2023-10-11T10:03:10Z)
MolCAP: Molecular Chemical reActivity pretraining and prompted-finetuning enhanced molecular representation learning [3.179128580341411]
MolCAP is a graph pretraining Transformer based on chemical reactivity (IMR) knowledge with prompted finetuning. Prompted by MolCAP, even basic graph neural networks are capable of achieving surprising performance that outperforms previous models.
arXiv Detail & Related papers (2023-06-13T13:48:06Z)
Molecule Design by Latent Space Energy-Based Modeling and Gradual Distribution Shifting [53.44684898432997]
Generation of molecules with desired chemical and biological properties is critical for drug discovery. We propose a probabilistic generative model to capture the joint distribution of molecules and their properties. Our method achieves very strong performances on various molecule design tasks.
arXiv Detail & Related papers (2023-06-09T03:04:21Z)
Atomic and Subgraph-aware Bilateral Aggregation for Molecular Representation Learning [57.670845619155195]
We introduce a new model for molecular representation learning called the Atomic and Subgraph-aware Bilateral Aggregation (ASBA) ASBA addresses the limitations of previous atom-wise and subgraph-wise models by incorporating both types of information. Our method offers a more comprehensive way to learn representations for molecular property prediction and has broad potential in drug and material discovery applications.
arXiv Detail & Related papers (2023-05-22T00:56:00Z)
Materials Discovery with Extreme Properties via Reinforcement Learning-Guided Combinatorial Chemistry [0.23301643766310373]
Rule-based molecular designer driven by trained policy for selecting subsequent molecular fragments to get a target molecule. In an experiment aimed at discovering molecules that hit seven extreme target properties, our model discovered 1,315 of all target-hitting molecules. It has been confirmed that every molecule generated under the binding rules of molecular fragments is 100% chemically valid.
arXiv Detail & Related papers (2023-03-21T13:21:43Z)
Improving Molecular Pretraining with Complementary Featurizations [20.86159731100242]
Molecular pretraining is a paradigm to solve a variety of tasks in computational chemistry and drug discovery. We show that different featurization techniques convey chemical information differently. We propose a simple and effective MOlecular pretraining framework with COmplementary featurizations (MOCO)
arXiv Detail & Related papers (2022-09-29T21:11:09Z)
Exploring Chemical Space with Score-based Out-of-distribution Generation [57.15855198512551]
We propose a score-based diffusion scheme that incorporates out-of-distribution control in the generative differential equation (SDE) Since some novel molecules may not meet the basic requirements of real-world drugs, MOOD performs conditional generation by utilizing the gradients from a property predictor. We experimentally validate that MOOD is able to explore the chemical space beyond the training distribution, generating molecules that outscore ones found with existing methods, and even the top 0.01% of the original training pool.
arXiv Detail & Related papers (2022-06-06T06:17:11Z)
Chemical-Reaction-Aware Molecule Representation Learning [88.79052749877334]
We propose using chemical reactions to assist learning molecule representation. Our approach is proven effective to 1) keep the embedding space well-organized and 2) improve the generalization ability of molecule embeddings. Experimental results demonstrate that our method achieves state-of-the-art performance in a variety of downstream tasks.
arXiv Detail & Related papers (2021-09-21T00:08:43Z)
Property-aware Adaptive Relation Networks for Molecular Property Prediction [34.13439007658925]
We propose a property-aware adaptive relation networks (PAR) for the few-shot molecular property prediction problem. Our PAR is compatible with existing graph-based molecular encoders, and are further equipped with the ability to obtain property-aware molecular embedding and model molecular relation graph.
arXiv Detail & Related papers (2021-07-16T16:22:30Z)

This list is automatically generated from the titles and abstracts of the papers in this site.