Related papers: MolNexTR: A Generalized Deep Learning Model for Molecular Image Recognition

MolNexTR: A Generalized Deep Learning Model for Molecular Image Recognition

URL: http://arxiv.org/abs/2403.03691v2
Date: Fri, 8 Mar 2024 06:32:12 GMT
Title: MolNexTR: A Generalized Deep Learning Model for Molecular Image Recognition
Authors: Yufan Chen, Ching Ting Leung, Yong Huang, Jianwei Sun, Hao Chen, Hanyu Gao
Abstract summary: MolNexTR is a novel image-to-graph model that collaborates to fuse the strengths of ConvNext and Vision-TRansformer. It can predict atoms and bonds simultaneously and understand their layout rules. MolNexTR has demonstrated superior performance, achieving an accuracy rate of 81-97%.
Score: 4.7793786389946815
License: http://creativecommons.org/licenses/by/4.0/
Abstract: In the field of chemical structure recognition, the task of converting molecular images into graph structures and SMILES string stands as a significant challenge, primarily due to the varied drawing styles and conventions prevalent in chemical literature. To bridge this gap, we proposed MolNexTR, a novel image-to-graph deep learning model that collaborates to fuse the strengths of ConvNext, a powerful Convolutional Neural Network variant, and Vision-TRansformer. This integration facilitates a more nuanced extraction of both local and global features from molecular images. MolNexTR can predict atoms and bonds simultaneously and understand their layout rules. It also excels at flexibly integrating symbolic chemistry principles to discern chirality and decipher abbreviated structures. We further incorporate a series of advanced algorithms, including improved data augmentation module, image contamination module, and a post-processing module to get the final SMILES output. These modules synergistically enhance the model's robustness against the diverse styles of molecular imagery found in real literature. In our test sets, MolNexTR has demonstrated superior performance, achieving an accuracy rate of 81-97%, marking a significant advancement in the domain of molecular structure recognition. Scientific contribution: MolNexTR is a novel image-to-graph model that incorporates a unique dual-stream encoder to extract complex molecular image features, and combines chemical rules to predict atoms and bonds while understanding atom and bond layout rules. In addition, it employs a series of novel augmentation algorithms to significantly enhance the robustness and performance of the model.

Related papers

GTR-CoT: Graph Traversal as Visual Chain of Thought for Molecular Structure Recognition [60.76623665324548]
GTR-Mol-VLM is a novel framework featuring two key innovations.<n>It emulates human reasoning by incrementally parsing molecular graphs through sequential atom-bond predictions.<n>MolRec-Bench is the first benchmark designed for a fine-grained evaluation of graph-parsing accuracy in OCSR.
arXiv Detail & Related papers (2025-06-09T08:47:10Z)
Multi-Modal Molecular Representation Learning via Structure Awareness [19.813872931221546]
We propose a structure-awareness-based multi-modal self-supervised molecular representation pre-training framework (MMSA)<n>MMSA enhances molecular graph representations by leveraging invariant knowledge between molecules.<n>It achieves state-of-the-art performance on the MoleculeNet benchmark, with average ROC-AUC improvements ranging from 1.8% to 9.6% over baseline methods.
arXiv Detail & Related papers (2025-05-09T08:37:29Z)
Broadening Discovery through Structural Models: Multimodal Combination of Local and Structural Properties for Predicting Chemical Features [42.203344899915464]
This study aims to develop a language model that is specifically trained on fingerprints. We introduce a bimodal architecture that integrates this language model with a graph model. This integration results in a significant improvement in predictive performance compared to conventional strategies.
arXiv Detail & Related papers (2025-02-25T08:53:18Z)
DiffMS: Diffusion Generation of Molecules Conditioned on Mass Spectra [60.39311767532607]
DiffMS is a formula-restricted encoder-decoder generative network. We develop a robust decoder that bridges latent embeddings and molecular structures. Experiments show DiffMS outperforms existing models on $textitde novo$ molecule generation.
arXiv Detail & Related papers (2025-02-13T18:29:48Z)
MolParser: End-to-end Visual Recognition of Molecule Structures in the Wild [23.78185449646608]
We present Mol, a novel end-to-end optical chemical structure recognition method. We use a SMILES encoding rule to annotate Mol-7M, the largest annotated molecular image dataset. We trained an end-to-end molecular image captioning model, Mol, using a curriculum learning approach.
arXiv Detail & Related papers (2024-11-17T15:00:09Z)
GraphXForm: Graph transformer for computer-aided molecular design with application to extraction [73.1842164721868]
We present GraphXForm, a decoder-only graph transformer architecture, which is pretrained on existing compounds and then fine-tuned. We evaluate it on two solvent design tasks for liquid-liquid extraction, showing that it outperforms four state-of-the-art molecular design techniques.
arXiv Detail & Related papers (2024-11-03T19:45:15Z)
Pre-trained Molecular Language Models with Random Functional Group Masking [54.900360309677794]
We propose a SMILES-based underlineem Molecular underlineem Language underlineem Model, which randomly masking SMILES subsequences corresponding to specific molecular atoms. This technique aims to compel the model to better infer molecular structures and properties, thus enhancing its predictive capabilities.
arXiv Detail & Related papers (2024-11-03T01:56:15Z)
FARM: Functional Group-Aware Representations for Small Molecules [55.281754551202326]
We introduce Functional Group-Aware Representations for Small Molecules (FARM) FARM is a foundation model designed to bridge the gap between SMILES, natural language, and molecular graphs. We rigorously evaluate FARM on the MoleculeNet dataset, where it achieves state-of-the-art performance on 10 out of 12 tasks.
arXiv Detail & Related papers (2024-10-02T23:04:58Z)
Data-Efficient Molecular Generation with Hierarchical Textual Inversion [48.816943690420224]
We introduce Hierarchical textual Inversion for Molecular generation (HI-Mol), a novel data-efficient molecular generation method. HI-Mol is inspired by the importance of hierarchical information, e.g., both coarse- and fine-grained features, in understanding the molecule distribution. Compared to the conventional textual inversion method in the image domain using a single-level token embedding, our multi-level token embeddings allow the model to effectively learn the underlying low-shot molecule distribution.
arXiv Detail & Related papers (2024-05-05T08:35:23Z)
MultiModal-Learning for Predicting Molecular Properties: A Framework Based on Image and Graph Structures [2.5563339057415218]
MolIG is a novel MultiModaL molecular pre-training framework for predicting molecular properties based on Image and Graph structures. It amalgamates the strengths of both molecular representation forms. It exhibits enhanced performance in downstream tasks pertaining to molecular property prediction within benchmark groups.
arXiv Detail & Related papers (2023-11-28T10:28:35Z)
A Molecular Multimodal Foundation Model Associating Molecule Graphs with Natural Language [63.60376252491507]
We propose a molecular multimodal foundation model which is pretrained from molecular graphs and their semantically related textual data. We believe that our model would have a broad impact on AI-empowered fields across disciplines such as biology, chemistry, materials, environment, and medicine.
arXiv Detail & Related papers (2022-09-12T00:56:57Z)
MolScribe: Robust Molecular Structure Recognition with Image-To-Graph Generation [28.93523736883784]
MolScribe is an image-to-graph model that explicitly predicts atoms and bonds, along with their geometric layouts, to construct the molecular structure. MolScribe significantly outperforms previous models, achieving 76-93% accuracy on public benchmarks.
arXiv Detail & Related papers (2022-05-28T03:03:45Z)
Image-to-Graph Transformers for Chemical Structure Recognition [4.180435324231826]
We present a deep learning model to extract molecular structures from images. The proposed model is designed to transform the molecular image directly into the corresponding graph. By end-to-end learning approach, it can fully utilize many open image-molecule pair data from various sources.
arXiv Detail & Related papers (2022-02-19T11:33:54Z)
Improved Conditional Flow Models for Molecule to Image Synthesis [37.886816307827196]
Mol2Image is a flow-based generative model for molecule to cell image synthesis. To generate cell features at different resolutions and scale to high-resolution images, we develop a novel multi-scale flow architecture. To maximize the mutual information between the generated images and the molecular interventions, we devise a training strategy based on contrastive learning.
arXiv Detail & Related papers (2020-06-15T16:39:50Z)
Multi-View Graph Neural Networks for Molecular Property Prediction [67.54644592806876]
We present Multi-View Graph Neural Network (MV-GNN), a multi-view message passing architecture. In MV-GNN, we introduce a shared self-attentive readout component and disagreement loss to stabilize the training process. We further boost the expressive power of MV-GNN by proposing a cross-dependent message passing scheme.
arXiv Detail & Related papers (2020-05-17T04:46:07Z)

This list is automatically generated from the titles and abstracts of the papers in this site.