SubGrapher: Visual Fingerprinting of Chemical Structures
- URL: http://arxiv.org/abs/2504.19695v1
- Date: Mon, 28 Apr 2025 11:45:46 GMT
- Title: SubGrapher: Visual Fingerprinting of Chemical Structures
- Authors: Lucas Morin, Gerhard Ingmar Meijer, Valéry Weber, Luc Van Gool, Peter W. J. Staar,
- Abstract summary: SubGrapher is a method for the visual fingerprinting of chemical structure images.<n>Unlike conventional Optical Chemical Structure Recognition (OCSR) models that attempt to reconstruct full molecular graphs, SubGrapher focuses on extracting molecular fingerprints directly from chemical structure images.<n>Our approach is evaluated against state-of-the-art OCSR and fingerprinting methods, demonstrating superior retrieval performance and robustness across diverse molecular depictions.
- Score: 46.677062201188015
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Automatic extraction of chemical structures from scientific literature plays a crucial role in accelerating research across fields ranging from drug discovery to materials science. Patent documents, in particular, contain molecular information in visual form, which is often inaccessible through traditional text-based searches. In this work, we introduce SubGrapher, a method for the visual fingerprinting of chemical structure images. Unlike conventional Optical Chemical Structure Recognition (OCSR) models that attempt to reconstruct full molecular graphs, SubGrapher focuses on extracting molecular fingerprints directly from chemical structure images. Using learning-based instance segmentation, SubGrapher identifies functional groups and carbon backbones, constructing a substructure-based fingerprint that enables chemical structure retrieval. Our approach is evaluated against state-of-the-art OCSR and fingerprinting methods, demonstrating superior retrieval performance and robustness across diverse molecular depictions. The dataset, models, and code will be made publicly available.
Related papers
- MarkushGrapher: Joint Visual and Textual Recognition of Markush Structures [47.41884299076947]
MarkushGrapher is a multi-modal approach for recognizing Markush structures in documents.<n>We propose a synthetic data generation pipeline that produces a wide range of realistic Markush structures.<n>M2S is the first annotated benchmark of real-world Markush structures.
arXiv Detail & Related papers (2025-03-20T12:40:38Z) - Knowledge-aware contrastive heterogeneous molecular graph learning [77.94721384862699]
We propose a paradigm shift by encoding molecular graphs into Heterogeneous Molecular Graph Learning (KCHML)<n>KCHML conceptualizes molecules through three distinct graph views-molecular, elemental, and pharmacological-enhanced by heterogeneous molecular graphs and a dual message-passing mechanism.<n>This design offers a comprehensive representation for property prediction, as well as for downstream tasks such as drug-drug interaction (DDI) prediction.
arXiv Detail & Related papers (2025-02-17T11:53:58Z) - MolParser: End-to-end Visual Recognition of Molecule Structures in the Wild [23.558032054114577]
We present Mol, a novel end-to-end optical chemical structure recognition method.<n>We use a SMILES encoding rule to annotate Mol-7M, the largest annotated molecular image dataset.<n>We trained an end-to-end molecular image captioning model, Mol, using a curriculum learning approach.
arXiv Detail & Related papers (2024-11-17T15:00:09Z) - GraphXForm: Graph transformer for computer-aided molecular design [73.1842164721868]
We present GraphXForm, a decoder-only graph transformer architecture, which is pretrained on existing compounds.<n>We evaluate it on various drug design tasks, demonstrating superior objective scores compared to state-of-the-art molecular design approaches.
arXiv Detail & Related papers (2024-11-03T19:45:15Z) - SMiCRM: A Benchmark Dataset of Mechanistic Molecular Images [0.8192907805418583]
We present a dataset designed to benchmark machine recognition capabilities of chemical molecules with arrow-pushing annotations.
This dataset includes a machine-readable molecular identity for each image as well as mechanistic arrows showing electron flow during chemical reactions.
arXiv Detail & Related papers (2024-07-25T18:52:10Z) - Expanding Chemical Representation with k-mers and Fragment-based Fingerprints for Molecular Fingerprinting [4.588028371034407]
This study introduces a novel approach, combining substruct counting, $k$-mers, and Daylight-like fingerprints, to expand the representation of chemical structures in SMILES strings.
The integrated method generates comprehensive molecular embeddings that enhance discriminative power and information content.
arXiv Detail & Related papers (2024-03-28T21:36:07Z) - MolGrapher: Graph-based Visual Recognition of Chemical Structures [50.13749978547401]
We introduce MolGrapher to recognize chemical structures visually.
We treat all candidate atoms and bonds as nodes and put them in a graph.
We classify atom and bond nodes in the graph with a Graph Neural Network.
arXiv Detail & Related papers (2023-08-23T16:16:11Z) - Graph-based Molecular Representation Learning [59.06193431883431]
Molecular representation learning (MRL) is a key step to build the connection between machine learning and chemical science.
Recently, MRL has achieved considerable progress, especially in methods based on deep molecular graph learning.
arXiv Detail & Related papers (2022-07-08T17:43:20Z) - Image-to-Graph Transformers for Chemical Structure Recognition [4.180435324231826]
We present a deep learning model to extract molecular structures from images.
The proposed model is designed to transform the molecular image directly into the corresponding graph.
By end-to-end learning approach, it can fully utilize many open image-molecule pair data from various sources.
arXiv Detail & Related papers (2022-02-19T11:33:54Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.