Augmenting Molecular Images with Vector Representations as a
Featurization Technique for Drug Classification
- URL: http://arxiv.org/abs/2008.03646v1
- Date: Sun, 9 Aug 2020 04:26:16 GMT
- Title: Augmenting Molecular Images with Vector Representations as a
Featurization Technique for Drug Classification
- Authors: Daniel de Marchi, Amarjit Budhiraja
- Abstract summary: This paper proposes the creation of molecular images captioned with binary vectors that encode information not contained in or easily understood from a molecular image alone.
We tested our method on the HIV dataset published by the Pande lab, which consists of 41,127 molecules labeled by if they inhibit the HIV virus.
- Score: 4.873362301533825
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: One of the key steps in building deep learning systems for drug
classification and generation is the choice of featurization for the molecules.
Previous featurization methods have included molecular images, binary strings,
graphs, and SMILES strings. This paper proposes the creation of molecular
images captioned with binary vectors that encode information not contained in
or easily understood from a molecular image alone. Specifically, we use Morgan
fingerprints, which encode higher level structural information, and MACCS keys,
which encode yes or no questions about a molecules properties and structure. We
tested our method on the HIV dataset published by the Pande lab, which consists
of 41,127 molecules labeled by if they inhibit the HIV virus. Our final model
achieved a state of the art AUC ROC on the HIV dataset, outperforming all other
methods. Moreover, the model converged significantly faster than most other
methods, requiring dramatically less computational power than unaugmented
images.
Related papers
- Knowledge-aware contrastive heterogeneous molecular graph learning [77.94721384862699]
We propose a paradigm shift by encoding molecular graphs into Heterogeneous Molecular Graph Learning (KCHML)
KCHML conceptualizes molecules through three distinct graph views-molecular, elemental, and pharmacological-enhanced by heterogeneous molecular graphs and a dual message-passing mechanism.
This design offers a comprehensive representation for property prediction, as well as for downstream tasks such as drug-drug interaction (DDI) prediction.
arXiv Detail & Related papers (2025-02-17T11:53:58Z) - MaskMol: Knowledge-guided Molecular Image Pre-Training Framework for Activity Cliffs [18.901322124389218]
MaskMol is a knowledge-guided molecular image self-supervised learning framework.
MaskMol accurately learns the representation of molecular images by considering multiple levels of molecular knowledge.
Results demonstrate MaskMol's high accuracy and transferability in activity cliff estimation and compound potency prediction.
arXiv Detail & Related papers (2024-09-02T03:03:22Z) - Learning Molecular Representation in a Cell [18.170650265987792]
We introduce the Information Alignment (InfoAlign) approach to learn molecular representations through the information bottleneck method in cells.
We integrate molecules and cellular response data as nodes into a context graph, connecting them with weighted edges based on chemical, biological, and computational criteria.
We demonstrate that the proposed sufficiency objective for alignment is tighter than existing encoder-based contrastive methods.
arXiv Detail & Related papers (2024-06-17T19:48:42Z) - Data-Efficient Molecular Generation with Hierarchical Textual Inversion [48.816943690420224]
We introduce Hierarchical textual Inversion for Molecular generation (HI-Mol), a novel data-efficient molecular generation method.
HI-Mol is inspired by the importance of hierarchical information, e.g., both coarse- and fine-grained features, in understanding the molecule distribution.
Compared to the conventional textual inversion method in the image domain using a single-level token embedding, our multi-level token embeddings allow the model to effectively learn the underlying low-shot molecule distribution.
arXiv Detail & Related papers (2024-05-05T08:35:23Z) - MultiModal-Learning for Predicting Molecular Properties: A Framework Based on Image and Graph Structures [2.5563339057415218]
MolIG is a novel MultiModaL molecular pre-training framework for predicting molecular properties based on Image and Graph structures.
It amalgamates the strengths of both molecular representation forms.
It exhibits enhanced performance in downstream tasks pertaining to molecular property prediction within benchmark groups.
arXiv Detail & Related papers (2023-11-28T10:28:35Z) - Bi-level Contrastive Learning for Knowledge-Enhanced Molecule
Representations [55.42602325017405]
We propose a novel method called GODE, which takes into account the two-level structure of individual molecules.
By pre-training two graph neural networks (GNNs) on different graph structures, combined with contrastive learning, GODE fuses molecular structures with their corresponding knowledge graph substructures.
When fine-tuned across 11 chemical property tasks, our model outperforms existing benchmarks, registering an average ROC-AUC uplift of 13.8% for classification tasks and an average RMSE/MAE enhancement of 35.1% for regression tasks.
arXiv Detail & Related papers (2023-06-02T15:49:45Z) - A Molecular Multimodal Foundation Model Associating Molecule Graphs with
Natural Language [63.60376252491507]
We propose a molecular multimodal foundation model which is pretrained from molecular graphs and their semantically related textual data.
We believe that our model would have a broad impact on AI-empowered fields across disciplines such as biology, chemistry, materials, environment, and medicine.
arXiv Detail & Related papers (2022-09-12T00:56:57Z) - Graph-based Molecular Representation Learning [59.06193431883431]
Molecular representation learning (MRL) is a key step to build the connection between machine learning and chemical science.
Recently, MRL has achieved considerable progress, especially in methods based on deep molecular graph learning.
arXiv Detail & Related papers (2022-07-08T17:43:20Z) - Chemical-Reaction-Aware Molecule Representation Learning [88.79052749877334]
We propose using chemical reactions to assist learning molecule representation.
Our approach is proven effective to 1) keep the embedding space well-organized and 2) improve the generalization ability of molecule embeddings.
Experimental results demonstrate that our method achieves state-of-the-art performance in a variety of downstream tasks.
arXiv Detail & Related papers (2021-09-21T00:08:43Z) - IMG2SMI: Translating Molecular Structure Images to Simplified
Molecular-input Line-entry System [29.946393284884778]
We introduce IMG2SMI, a model which leverages Deep Residual Networks for image feature extraction and an encoder-decoder Transformer layers for molecule description generation.
IMG2SMI outperforms OSRA-based systems by 163% in molecule similarity prediction as measured by the molecular MACCS Fingerprint Tanimoto Similarity.
We also release a new molecule prediction dataset including 81 million molecules for molecule description generation.
arXiv Detail & Related papers (2021-09-03T19:57:07Z) - MolCLR: Molecular Contrastive Learning of Representations via Graph
Neural Networks [11.994553575596228]
MolCLR is a self-supervised learning framework for large unlabeled molecule datasets.
We propose three novel molecule graph augmentations: atom masking, bond deletion, and subgraph removal.
Our method achieves state-of-the-art performance on many challenging datasets.
arXiv Detail & Related papers (2021-02-19T17:35:18Z) - Self-Supervised Graph Transformer on Large-Scale Molecular Data [73.3448373618865]
We propose a novel framework, GROVER, for molecular representation learning.
GROVER can learn rich structural and semantic information of molecules from enormous unlabelled molecular data.
We pre-train GROVER with 100 million parameters on 10 million unlabelled molecules -- the biggest GNN and the largest training dataset in molecular representation learning.
arXiv Detail & Related papers (2020-06-18T08:37:04Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.