MolScribe: Robust Molecular Structure Recognition with Image-To-Graph
Generation
- URL: http://arxiv.org/abs/2205.14311v2
- Date: Mon, 20 Mar 2023 23:04:53 GMT
- Title: MolScribe: Robust Molecular Structure Recognition with Image-To-Graph
Generation
- Authors: Yujie Qian, Jiang Guo, Zhengkai Tu, Zhening Li, Connor W. Coley,
Regina Barzilay
- Abstract summary: MolScribe is an image-to-graph model that explicitly predicts atoms and bonds, along with their geometric layouts, to construct the molecular structure.
MolScribe significantly outperforms previous models, achieving 76-93% accuracy on public benchmarks.
- Score: 28.93523736883784
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Molecular structure recognition is the task of translating a molecular image
into its graph structure. Significant variation in drawing styles and
conventions exhibited in chemical literature poses a significant challenge for
automating this task. In this paper, we propose MolScribe, a novel
image-to-graph generation model that explicitly predicts atoms and bonds, along
with their geometric layouts, to construct the molecular structure. Our model
flexibly incorporates symbolic chemistry constraints to recognize chirality and
expand abbreviated structures. We further develop data augmentation strategies
to enhance the model robustness against domain shifts. In experiments on both
synthetic and realistic molecular images, MolScribe significantly outperforms
previous models, achieving 76-93% accuracy on public benchmarks. Chemists can
also easily verify MolScribe's prediction, informed by its confidence
estimation and atom-level alignment with the input image. MolScribe is publicly
available through Python and web interfaces:
https://github.com/thomas0809/MolScribe.
Related papers
- GraphXForm: Graph transformer for computer-aided molecular design with application to extraction [73.1842164721868]
We present GraphXForm, a decoder-only graph transformer architecture, which is pretrained on existing compounds and then fine-tuned.
We evaluate it on two solvent design tasks for liquid-liquid extraction, showing that it outperforms four state-of-the-art molecular design techniques.
arXiv Detail & Related papers (2024-11-03T19:45:15Z) - Pre-trained Molecular Language Models with Random Functional Group Masking [54.900360309677794]
We propose a SMILES-based underlineem Molecular underlineem Language underlineem Model, which randomly masking SMILES subsequences corresponding to specific molecular atoms.
This technique aims to compel the model to better infer molecular structures and properties, thus enhancing its predictive capabilities.
arXiv Detail & Related papers (2024-11-03T01:56:15Z) - MolTRES: Improving Chemical Language Representation Learning for Molecular Property Prediction [14.353313239109337]
MolTRES is a novel chemical language representation learning framework.
It incorporates generator-discriminator training, allowing the model to learn from more challenging examples.
Our model outperforms existing state-of-the-art models on popular molecular property prediction tasks.
arXiv Detail & Related papers (2024-07-09T01:14:28Z) - MolNexTR: A Generalized Deep Learning Model for Molecular Image Recognition [4.510482519069965]
MolNexTR is a novel image-to-graph deep learning model that collaborates to fuse the strengths of ConvNext and Vision-TRansformer.
It can predict atoms and bonds simultaneously and understand their layout rules.
In our test sets, MolNexTR has demonstrated superior performance, achieving an accuracy rate of 81-97%.
arXiv Detail & Related papers (2024-03-06T13:17:41Z) - MultiModal-Learning for Predicting Molecular Properties: A Framework Based on Image and Graph Structures [2.5563339057415218]
MolIG is a novel MultiModaL molecular pre-training framework for predicting molecular properties based on Image and Graph structures.
It amalgamates the strengths of both molecular representation forms.
It exhibits enhanced performance in downstream tasks pertaining to molecular property prediction within benchmark groups.
arXiv Detail & Related papers (2023-11-28T10:28:35Z) - MolGrapher: Graph-based Visual Recognition of Chemical Structures [50.13749978547401]
We introduce MolGrapher to recognize chemical structures visually.
We treat all candidate atoms and bonds as nodes and put them in a graph.
We classify atom and bond nodes in the graph with a Graph Neural Network.
arXiv Detail & Related papers (2023-08-23T16:16:11Z) - Atomic and Subgraph-aware Bilateral Aggregation for Molecular
Representation Learning [57.670845619155195]
We introduce a new model for molecular representation learning called the Atomic and Subgraph-aware Bilateral Aggregation (ASBA)
ASBA addresses the limitations of previous atom-wise and subgraph-wise models by incorporating both types of information.
Our method offers a more comprehensive way to learn representations for molecular property prediction and has broad potential in drug and material discovery applications.
arXiv Detail & Related papers (2023-05-22T00:56:00Z) - MolXPT: Wrapping Molecules with Text for Generative Pre-training [141.0924452870112]
MolXPT is a unified language model of text and molecules pre-trained on SMILES wrapped by text.
MolXPT outperforms strong baselines of molecular property prediction on MoleculeNet.
arXiv Detail & Related papers (2023-05-18T03:58:19Z) - Domain-Agnostic Molecular Generation with Chemical Feedback [44.063584808910896]
MolGen is a pre-trained molecular language model tailored specifically for molecule generation.
It internalizes structural and grammatical insights through the reconstruction of over 100 million molecular SELFIES.
Our chemical feedback paradigm steers the model away from molecular hallucinations, ensuring alignment between the model's estimated probabilities and real-world chemical preferences.
arXiv Detail & Related papers (2023-01-26T17:52:56Z) - A Molecular Multimodal Foundation Model Associating Molecule Graphs with
Natural Language [63.60376252491507]
We propose a molecular multimodal foundation model which is pretrained from molecular graphs and their semantically related textual data.
We believe that our model would have a broad impact on AI-empowered fields across disciplines such as biology, chemistry, materials, environment, and medicine.
arXiv Detail & Related papers (2022-09-12T00:56:57Z) - Image-to-Graph Transformers for Chemical Structure Recognition [4.180435324231826]
We present a deep learning model to extract molecular structures from images.
The proposed model is designed to transform the molecular image directly into the corresponding graph.
By end-to-end learning approach, it can fully utilize many open image-molecule pair data from various sources.
arXiv Detail & Related papers (2022-02-19T11:33:54Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.