MolParser: End-to-end Visual Recognition of Molecule Structures in the Wild
- URL: http://arxiv.org/abs/2411.11098v1
- Date: Sun, 17 Nov 2024 15:00:09 GMT
- Title: MolParser: End-to-end Visual Recognition of Molecule Structures in the Wild
- Authors: Xi Fang, Jiankun Wang, Xiaochen Cai, Shangqian Chen, Shuwen Yang, Lin Yao, Linfeng Zhang, Guolin Ke,
- Abstract summary: We present Mol, a novel end-to-end optical chemical structure recognition method.
We use a SMILES encoding rule to annotate Mol-7M, the largest annotated molecular image dataset.
We trained an end-to-end molecular image captioning model, Mol, using a curriculum learning approach.
- Score: 23.78185449646608
- License:
- Abstract: In recent decades, chemistry publications and patents have increased rapidly. A significant portion of key information is embedded in molecular structure figures, complicating large-scale literature searches and limiting the application of large language models in fields such as biology, chemistry, and pharmaceuticals. The automatic extraction of precise chemical structures is of critical importance. However, the presence of numerous Markush structures in real-world documents, along with variations in molecular image quality, drawing styles, and noise, significantly limits the performance of existing optical chemical structure recognition (OCSR) methods. We present MolParser, a novel end-to-end OCSR method that efficiently and accurately recognizes chemical structures from real-world documents, including difficult Markush structure. We use a extended SMILES encoding rule to annotate our training dataset. Under this rule, we build MolParser-7M, the largest annotated molecular image dataset to our knowledge. While utilizing a large amount of synthetic data, we employed active learning methods to incorporate substantial in-the-wild data, specifically samples cropped from real patents and scientific literature, into the training process. We trained an end-to-end molecular image captioning model, MolParser, using a curriculum learning approach. MolParser significantly outperforms classical and learning-based methods across most scenarios, with potential for broader downstream applications. The dataset is publicly available.
Related papers
- MolTRES: Improving Chemical Language Representation Learning for Molecular Property Prediction [14.353313239109337]
MolTRES is a novel chemical language representation learning framework.
It incorporates generator-discriminator training, allowing the model to learn from more challenging examples.
Our model outperforms existing state-of-the-art models on popular molecular property prediction tasks.
arXiv Detail & Related papers (2024-07-09T01:14:28Z) - Data-Efficient Molecular Generation with Hierarchical Textual Inversion [48.816943690420224]
We introduce Hierarchical textual Inversion for Molecular generation (HI-Mol), a novel data-efficient molecular generation method.
HI-Mol is inspired by the importance of hierarchical information, e.g., both coarse- and fine-grained features, in understanding the molecule distribution.
Compared to the conventional textual inversion method in the image domain using a single-level token embedding, our multi-level token embeddings allow the model to effectively learn the underlying low-shot molecule distribution.
arXiv Detail & Related papers (2024-05-05T08:35:23Z) - MolNexTR: A Generalized Deep Learning Model for Molecular Image Recognition [4.510482519069965]
MolNexTR is a novel image-to-graph deep learning model that collaborates to fuse the strengths of ConvNext and Vision-TRansformer.
It can predict atoms and bonds simultaneously and understand their layout rules.
In our test sets, MolNexTR has demonstrated superior performance, achieving an accuracy rate of 81-97%.
arXiv Detail & Related papers (2024-03-06T13:17:41Z) - MolGrapher: Graph-based Visual Recognition of Chemical Structures [50.13749978547401]
We introduce MolGrapher to recognize chemical structures visually.
We treat all candidate atoms and bonds as nodes and put them in a graph.
We classify atom and bond nodes in the graph with a Graph Neural Network.
arXiv Detail & Related papers (2023-08-23T16:16:11Z) - Empowering Molecule Discovery for Molecule-Caption Translation with Large Language Models: A ChatGPT Perspective [53.300288393173204]
Large Language Models (LLMs) have shown remarkable performance in various cross-modal tasks.
In this work, we propose an In-context Few-Shot Molecule Learning paradigm for molecule-caption translation.
We evaluate the effectiveness of MolReGPT on molecule-caption translation, including molecule understanding and text-based molecule generation.
arXiv Detail & Related papers (2023-06-11T08:16:25Z) - Implicit Geometry and Interaction Embeddings Improve Few-Shot Molecular
Property Prediction [53.06671763877109]
We develop molecular embeddings that encode complex molecular characteristics to improve the performance of few-shot molecular property prediction.
Our approach leverages large amounts of synthetic data, namely the results of molecular docking calculations.
On multiple molecular property prediction benchmarks, training from the embedding space substantially improves Multi-Task, MAML, and Prototypical Network few-shot learning performance.
arXiv Detail & Related papers (2023-02-04T01:32:40Z) - Multi-modal Molecule Structure-text Model for Text-based Retrieval and
Editing [107.49804059269212]
We present a multi-modal molecule structure-text model, MoleculeSTM, by jointly learning molecules' chemical structures and textual descriptions.
In experiments, MoleculeSTM obtains the state-of-the-art generalization ability to novel biochemical concepts.
arXiv Detail & Related papers (2022-12-21T06:18:31Z) - A Molecular Multimodal Foundation Model Associating Molecule Graphs with
Natural Language [63.60376252491507]
We propose a molecular multimodal foundation model which is pretrained from molecular graphs and their semantically related textual data.
We believe that our model would have a broad impact on AI-empowered fields across disciplines such as biology, chemistry, materials, environment, and medicine.
arXiv Detail & Related papers (2022-09-12T00:56:57Z) - MolScribe: Robust Molecular Structure Recognition with Image-To-Graph
Generation [28.93523736883784]
MolScribe is an image-to-graph model that explicitly predicts atoms and bonds, along with their geometric layouts, to construct the molecular structure.
MolScribe significantly outperforms previous models, achieving 76-93% accuracy on public benchmarks.
arXiv Detail & Related papers (2022-05-28T03:03:45Z) - Image-to-Graph Transformers for Chemical Structure Recognition [4.180435324231826]
We present a deep learning model to extract molecular structures from images.
The proposed model is designed to transform the molecular image directly into the corresponding graph.
By end-to-end learning approach, it can fully utilize many open image-molecule pair data from various sources.
arXiv Detail & Related papers (2022-02-19T11:33:54Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.