MolGrapher: Graph-based Visual Recognition of Chemical Structures
- URL: http://arxiv.org/abs/2308.12234v1
- Date: Wed, 23 Aug 2023 16:16:11 GMT
- Title: MolGrapher: Graph-based Visual Recognition of Chemical Structures
- Authors: Lucas Morin, Martin Danelljan, Maria Isabel Agea, Ahmed Nassar, Valery
Weber, Ingmar Meijer, Peter Staar, Fisher Yu
- Abstract summary: We introduce MolGrapher to recognize chemical structures visually.
We treat all candidate atoms and bonds as nodes and put them in a graph.
We classify atom and bond nodes in the graph with a Graph Neural Network.
- Score: 50.13749978547401
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The automatic analysis of chemical literature has immense potential to
accelerate the discovery of new materials and drugs. Much of the critical
information in patent documents and scientific articles is contained in
figures, depicting the molecule structures. However, automatically parsing the
exact chemical structure is a formidable challenge, due to the amount of
detailed information, the diversity of drawing styles, and the need for
training data. In this work, we introduce MolGrapher to recognize chemical
structures visually. First, a deep keypoint detector detects the atoms. Second,
we treat all candidate atoms and bonds as nodes and put them in a graph. This
construct allows a natural graph representation of the molecule. Last, we
classify atom and bond nodes in the graph with a Graph Neural Network. To
address the lack of real training data, we propose a synthetic data generation
pipeline producing diverse and realistic results. In addition, we introduce a
large-scale benchmark of annotated real molecule images, USPTO-30K, to spur
research on this critical topic. Extensive experiments on five datasets show
that our approach significantly outperforms classical and learning-based
methods in most settings. Code, models, and datasets are available.
Related papers
- Bi-level Contrastive Learning for Knowledge-Enhanced Molecule
Representations [55.42602325017405]
We propose a novel method called GODE, which takes into account the two-level structure of individual molecules.
By pre-training two graph neural networks (GNNs) on different graph structures, combined with contrastive learning, GODE fuses molecular structures with their corresponding knowledge graph substructures.
When fine-tuned across 11 chemical property tasks, our model outperforms existing benchmarks, registering an average ROC-AUC uplift of 13.8% for classification tasks and an average RMSE/MAE enhancement of 35.1% for regression tasks.
arXiv Detail & Related papers (2023-06-02T15:49:45Z) - Atomic and Subgraph-aware Bilateral Aggregation for Molecular
Representation Learning [57.670845619155195]
We introduce a new model for molecular representation learning called the Atomic and Subgraph-aware Bilateral Aggregation (ASBA)
ASBA addresses the limitations of previous atom-wise and subgraph-wise models by incorporating both types of information.
Our method offers a more comprehensive way to learn representations for molecular property prediction and has broad potential in drug and material discovery applications.
arXiv Detail & Related papers (2023-05-22T00:56:00Z) - Graph-based Molecular Representation Learning [59.06193431883431]
Molecular representation learning (MRL) is a key step to build the connection between machine learning and chemical science.
Recently, MRL has achieved considerable progress, especially in methods based on deep molecular graph learning.
arXiv Detail & Related papers (2022-07-08T17:43:20Z) - Molecular Contrastive Learning with Chemical Element Knowledge Graph [16.136921143416927]
Molecular representation learning contributes to multiple downstream tasks such as molecular property prediction and drug design.
We construct a Chemical Element Knowledge Graph (KG) to summarize microscopic associations between elements.
The first module, knowledge-guided graph augmentation, augments the original molecular graph based on the Chemical Element KG.
The second module, knowledge-aware graph representation, extracts molecular representations with a common graph encoder for the original molecular graph and a Knowledge-aware Message Passing Neural Network (KMPNN) to encode complex information in the augmented molecular graph.
arXiv Detail & Related papers (2021-12-01T15:04:39Z) - Molecular Graph Generation via Geometric Scattering [7.796917261490019]
Graph neural networks (GNNs) have been used extensively for addressing problems in drug design and discovery.
We propose a representation-first approach to molecular graph generation.
We show that our architecture learns meaningful representations of drug datasets and provides a platform for goal-directed drug synthesis.
arXiv Detail & Related papers (2021-10-12T18:00:23Z) - Learning Attributed Graph Representations with Communicative Message
Passing Transformer [3.812358821429274]
We propose a Communicative Message Passing Transformer (CoMPT) neural network to improve the molecular graph representation.
Unlike the previous transformer-style GNNs that treat molecules as fully connected graphs, we introduce a message diffusion mechanism to leverage the graph connectivity inductive bias.
arXiv Detail & Related papers (2021-07-19T11:58:32Z) - MolCLR: Molecular Contrastive Learning of Representations via Graph
Neural Networks [11.994553575596228]
MolCLR is a self-supervised learning framework for large unlabeled molecule datasets.
We propose three novel molecule graph augmentations: atom masking, bond deletion, and subgraph removal.
Our method achieves state-of-the-art performance on many challenging datasets.
arXiv Detail & Related papers (2021-02-19T17:35:18Z) - Advanced Graph and Sequence Neural Networks for Molecular Property
Prediction and Drug Discovery [53.00288162642151]
We develop MoleculeKit, a suite of comprehensive machine learning tools spanning different computational models and molecular representations.
Built on these representations, MoleculeKit includes both deep learning and traditional machine learning methods for graph and sequence data.
Results on both online and offline antibiotics discovery and molecular property prediction tasks show that MoleculeKit achieves consistent improvements over prior methods.
arXiv Detail & Related papers (2020-12-02T02:09:31Z) - Self-Supervised Graph Transformer on Large-Scale Molecular Data [73.3448373618865]
We propose a novel framework, GROVER, for molecular representation learning.
GROVER can learn rich structural and semantic information of molecules from enormous unlabelled molecular data.
We pre-train GROVER with 100 million parameters on 10 million unlabelled molecules -- the biggest GNN and the largest training dataset in molecular representation learning.
arXiv Detail & Related papers (2020-06-18T08:37:04Z) - ChemGrapher: Optical Graph Recognition of Chemical Compounds by Deep
Learning [6.88204255655161]
In drug discovery, knowledge of the graph structure of chemical compounds is essential.
A tool to analyze images automatically and convert them into a chemical graph structure would be useful for many applications.
We develop a deep neural network model for optical compound recognition.
arXiv Detail & Related papers (2020-02-23T14:30:55Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.