Advancing Molecular Machine (Learned) Representations with Stereoelectronics-Infused Molecular Graphs
- URL: http://arxiv.org/abs/2408.04520v1
- Date: Thu, 8 Aug 2024 15:21:07 GMT
- Title: Advancing Molecular Machine (Learned) Representations with Stereoelectronics-Infused Molecular Graphs
- Authors: Daniil A. Boiko, Thiago Reschützegger, Benjamin Sanchez-Lengeling, Samuel M. Blau, Gabe Gomes,
- Abstract summary: We introduce a novel approach to infusing quantum-chemical-rich information into molecular graphs via stereoelectronic effects.
We show that the explicit addition of stereoelectronic interactions significantly improves the performance of molecular machine learning models.
We also show that the learned representations allow for facile stereoelectronic evaluation of previously intractable systems.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Molecular representation is a foundational element in our understanding of the physical world. Its importance ranges from the fundamentals of chemical reactions to the design of new therapies and materials. Previous molecular machine learning models have employed strings, fingerprints, global features, and simple molecular graphs that are inherently information-sparse representations. However, as the complexity of prediction tasks increases, the molecular representation needs to encode higher fidelity information. This work introduces a novel approach to infusing quantum-chemical-rich information into molecular graphs via stereoelectronic effects. We show that the explicit addition of stereoelectronic interactions significantly improves the performance of molecular machine learning models. Furthermore, stereoelectronics-infused representations can be learned and deployed with a tailored double graph neural network workflow, enabling its application to any downstream molecular machine learning task. Finally, we show that the learned representations allow for facile stereoelectronic evaluation of previously intractable systems, such as entire proteins, opening new avenues of molecular design.
Related papers
- Mol-LLaMA: Towards General Understanding of Molecules in Large Molecular Language Model [55.87790704067848]
Mol-LLaMA is a large molecular language model that grasps the general knowledge centered on molecules.
We introduce a module that integrates complementary information from different molecular encoders.
Our experimental results demonstrate that Mol-LLaMA is capable of comprehending the general features of molecules.
arXiv Detail & Related papers (2025-02-19T05:49:10Z) - Knowledge-aware contrastive heterogeneous molecular graph learning [77.94721384862699]
We propose a paradigm shift by encoding molecular graphs into Heterogeneous Molecular Graph Learning (KCHML)
KCHML conceptualizes molecules through three distinct graph views-molecular, elemental, and pharmacological-enhanced by heterogeneous molecular graphs and a dual message-passing mechanism.
This design offers a comprehensive representation for property prediction, as well as for downstream tasks such as drug-drug interaction (DDI) prediction.
arXiv Detail & Related papers (2025-02-17T11:53:58Z) - MOL-Mamba: Enhancing Molecular Representation with Structural & Electronic Insights [23.55889965960128]
We introduce MOL-Mamba, a framework that enhances molecular representation by combining structural and electronic insights.
MOL-Mamba outperforms state-of-the-art baselines across eleven chemical-biological molecular datasets.
arXiv Detail & Related papers (2024-12-21T04:48:57Z) - GraphXForm: Graph transformer for computer-aided molecular design with application to extraction [73.1842164721868]
We present GraphXForm, a decoder-only graph transformer architecture, which is pretrained on existing compounds and then fine-tuned.
We evaluate it on two solvent design tasks for liquid-liquid extraction, showing that it outperforms four state-of-the-art molecular design techniques.
arXiv Detail & Related papers (2024-11-03T19:45:15Z) - Pre-trained Molecular Language Models with Random Functional Group Masking [54.900360309677794]
We propose a SMILES-based underlineem Molecular underlineem Language underlineem Model, which randomly masking SMILES subsequences corresponding to specific molecular atoms.
This technique aims to compel the model to better infer molecular structures and properties, thus enhancing its predictive capabilities.
arXiv Detail & Related papers (2024-11-03T01:56:15Z) - MolGrapher: Graph-based Visual Recognition of Chemical Structures [50.13749978547401]
We introduce MolGrapher to recognize chemical structures visually.
We treat all candidate atoms and bonds as nodes and put them in a graph.
We classify atom and bond nodes in the graph with a Graph Neural Network.
arXiv Detail & Related papers (2023-08-23T16:16:11Z) - Bi-level Contrastive Learning for Knowledge-Enhanced Molecule
Representations [55.42602325017405]
We propose a novel method called GODE, which takes into account the two-level structure of individual molecules.
By pre-training two graph neural networks (GNNs) on different graph structures, combined with contrastive learning, GODE fuses molecular structures with their corresponding knowledge graph substructures.
When fine-tuned across 11 chemical property tasks, our model outperforms existing benchmarks, registering an average ROC-AUC uplift of 13.8% for classification tasks and an average RMSE/MAE enhancement of 35.1% for regression tasks.
arXiv Detail & Related papers (2023-06-02T15:49:45Z) - A Molecular Multimodal Foundation Model Associating Molecule Graphs with
Natural Language [63.60376252491507]
We propose a molecular multimodal foundation model which is pretrained from molecular graphs and their semantically related textual data.
We believe that our model would have a broad impact on AI-empowered fields across disciplines such as biology, chemistry, materials, environment, and medicine.
arXiv Detail & Related papers (2022-09-12T00:56:57Z) - Transferring Chemical and Energetic Knowledge Between Molecular Systems
with Machine Learning [5.27145343046974]
We propose a novel methodology for transferring knowledge obtained from simple molecular systems to a more complex one.
We focus on the classification of high and low free-energy states.
Our results show a remarkable AUC of 0.92 for transfer learning from tri-alanine to the deca-alanine system.
arXiv Detail & Related papers (2022-05-06T16:21:00Z) - Molecular Graph Generation via Geometric Scattering [7.796917261490019]
Graph neural networks (GNNs) have been used extensively for addressing problems in drug design and discovery.
We propose a representation-first approach to molecular graph generation.
We show that our architecture learns meaningful representations of drug datasets and provides a platform for goal-directed drug synthesis.
arXiv Detail & Related papers (2021-10-12T18:00:23Z) - Knowledge-aware Contrastive Molecular Graph Learning [5.08771973600915]
We propose Contrastive Knowledge-aware GNN (CKGNN) for self-supervised molecular representation learning.
We explicitly encode domain knowledge via knowledge-aware molecular encoder under the contrastive learning framework.
Experiments on 8 public datasets demonstrate the effectiveness of our model with a 6% absolute improvement on average.
arXiv Detail & Related papers (2021-03-24T08:55:08Z) - MolCLR: Molecular Contrastive Learning of Representations via Graph
Neural Networks [11.994553575596228]
MolCLR is a self-supervised learning framework for large unlabeled molecule datasets.
We propose three novel molecule graph augmentations: atom masking, bond deletion, and subgraph removal.
Our method achieves state-of-the-art performance on many challenging datasets.
arXiv Detail & Related papers (2021-02-19T17:35:18Z) - Self-Supervised Graph Transformer on Large-Scale Molecular Data [73.3448373618865]
We propose a novel framework, GROVER, for molecular representation learning.
GROVER can learn rich structural and semantic information of molecules from enormous unlabelled molecular data.
We pre-train GROVER with 100 million parameters on 10 million unlabelled molecules -- the biggest GNN and the largest training dataset in molecular representation learning.
arXiv Detail & Related papers (2020-06-18T08:37:04Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.