Related papers: MarkushGrapher: Joint Visual and Textual Recognition of Markush Structures

MarkushGrapher: Joint Visual and Textual Recognition of Markush Structures

URL: http://arxiv.org/abs/2503.16096v1
Date: Thu, 20 Mar 2025 12:40:38 GMT
Title: MarkushGrapher: Joint Visual and Textual Recognition of Markush Structures
Authors: Lucas Morin, Valéry Weber, Ahmed Nassar, Gerhard Ingmar Meijer, Luc Van Gool, Yawei Li, Peter Staar,
Abstract summary: MarkushGrapher is a multi-modal approach for recognizing Markush structures in documents.<n>We propose a synthetic data generation pipeline that produces a wide range of realistic Markush structures.<n>M2S is the first annotated benchmark of real-world Markush structures.
Score: 47.41884299076947
License: http://creativecommons.org/licenses/by/4.0/
Abstract: The automated analysis of chemical literature holds promise to accelerate discovery in fields such as material science and drug development. In particular, search capabilities for chemical structures and Markush structures (chemical structure templates) within patent documents are valuable, e.g., for prior-art search. Advancements have been made in the automatic extraction of chemical structures from text and images, yet the Markush structures remain largely unexplored due to their complex multi-modal nature. In this work, we present MarkushGrapher, a multi-modal approach for recognizing Markush structures in documents. Our method jointly encodes text, image, and layout information through a Vision-Text-Layout encoder and an Optical Chemical Structure Recognition vision encoder. These representations are merged and used to auto-regressively generate a sequential graph representation of the Markush structure along with a table defining its variable groups. To overcome the lack of real-world training data, we propose a synthetic data generation pipeline that produces a wide range of realistic Markush structures. Additionally, we present M2S, the first annotated benchmark of real-world Markush structures, to advance research on this challenging task. Extensive experiments demonstrate that our approach outperforms state-of-the-art chemistry-specific and general-purpose vision-language models in most evaluation settings. Code, models, and datasets will be available.

Related papers

A Multi-Agent System Enables Versatile Information Extraction from the Chemical Literature [8.306442315850878]
We develop a multimodal large language model (MLLM)-based multi-agent system for robust and automated chemical information extraction.<n>Our system achieved an F1 score of 80.8% on a benchmark dataset of sophisticated multimodal chemical reaction graphics from the literature.
arXiv Detail & Related papers (2025-07-27T11:16:57Z)
SubGrapher: Visual Fingerprinting of Chemical Structures [46.677062201188015]
SubGrapher is a method for the visual fingerprinting of chemical structure images. Unlike conventional Optical Chemical Structure Recognition (OCSR) models that attempt to reconstruct full molecular graphs, SubGrapher focuses on extracting molecular fingerprints directly from chemical structure images. Our approach is evaluated against state-of-the-art OCSR and fingerprinting methods, demonstrating superior retrieval performance and robustness across diverse molecular depictions.
arXiv Detail & Related papers (2025-04-28T11:45:46Z)
Mixture of Structural-and-Textual Retrieval over Text-rich Graph Knowledge Bases [78.62158923194153]
Text-rich Graph Knowledge Bases (TG-KBs) have become increasingly crucial for answering queries by providing textual and structural knowledge. We propose a Mixture of Structural-and-Textual Retrieval (MoR) to retrieve these two types of knowledge via a Planning-Reasoning-Organizing framework.
arXiv Detail & Related papers (2025-02-27T17:42:52Z)
Multimodal Search in Chemical Documents and Reactions [26.94136747669151]
We present a multimodal search tool that facilitates retrieval of chemical reactions, molecular structures, and associated text from scientific literature.<n> Queries may combine molecular diagrams, textual descriptions, and reaction data, allowing users to connect different representations of chemical information.<n>We describe the system's architecture, key functionalities, and retrieval process, along with expert assessments of the system.
arXiv Detail & Related papers (2025-02-24T06:00:17Z)
MolParser: End-to-end Visual Recognition of Molecule Structures in the Wild [23.558032054114577]
We present Mol, a novel end-to-end optical chemical structure recognition method.<n>We use a SMILES encoding rule to annotate Mol-7M, the largest annotated molecular image dataset.<n>We trained an end-to-end molecular image captioning model, Mol, using a curriculum learning approach.
arXiv Detail & Related papers (2024-11-17T15:00:09Z)
Exploring Optimal Transport-Based Multi-Grained Alignments for Text-Molecule Retrieval [24.061535843472427]
We introduce the Optimal TRansport-based Multi-grained Alignments model (ORMA) ORMA is a novel approach that facilitates multi-grained alignments between textual descriptions and molecules. Experimental results on the ChEBI-20 and PCdes datasets demonstrate that ORMA significantly outperforms existing state-of-the-art (SOTA) models.
arXiv Detail & Related papers (2024-11-04T06:30:52Z)
ChemMiner: A Large Language Model Agent System for Chemical Literature Data Mining [56.15126714863963]
ChemMiner is an end-to-end framework for extracting chemical data from literature.<n>ChemMiner incorporates three specialized agents: a text analysis agent for coreference mapping, a multimodal agent for non-textual information extraction, and a synthesis analysis agent for data generation.<n> Experimental results demonstrate reaction identification rates comparable to human chemists while significantly reducing processing time, with high accuracy, recall, and F1 scores.
arXiv Detail & Related papers (2024-02-20T13:21:46Z)
SciMMIR: Benchmarking Scientific Multi-modal Information Retrieval [64.03631654052445]
Current benchmarks for evaluating MMIR performance in image-text pairing within the scientific domain show a notable gap. We develop a specialised scientific MMIR benchmark by leveraging open-access paper collections. This benchmark comprises 530K meticulously curated image-text pairs, extracted from figures and tables with detailed captions in scientific documents.
arXiv Detail & Related papers (2024-01-24T14:23:12Z)
MolGrapher: Graph-based Visual Recognition of Chemical Structures [50.13749978547401]
We introduce MolGrapher to recognize chemical structures visually. We treat all candidate atoms and bonds as nodes and put them in a graph. We classify atom and bond nodes in the graph with a Graph Neural Network.
arXiv Detail & Related papers (2023-08-23T16:16:11Z)
Multi-modal Molecule Structure-text Model for Text-based Retrieval and Editing [107.49804059269212]
We present a multi-modal molecule structure-text model, MoleculeSTM, by jointly learning molecules' chemical structures and textual descriptions. In experiments, MoleculeSTM obtains the state-of-the-art generalization ability to novel biochemical concepts.
arXiv Detail & Related papers (2022-12-21T06:18:31Z)

This list is automatically generated from the titles and abstracts of the papers in this site.