Multimodal Search in Chemical Documents and Reactions
- URL: http://arxiv.org/abs/2502.16865v1
- Date: Mon, 24 Feb 2025 06:00:17 GMT
- Title: Multimodal Search in Chemical Documents and Reactions
- Authors: Ayush Kumar Shah, Abhisek Dey, Leo Luo, Bryan Amador, Patrick Philippy, Ming Zhong, Siru Ouyang, David Mark Friday, David Bianchi, Nick Jackson, Richard Zanibbi, Jiawei Han,
- Abstract summary: We present a multimodal search tool that facilitates retrieval of chemical reactions, molecular structures, and associated text from scientific literature.<n> Queries may combine molecular diagrams, textual descriptions, and reaction data, allowing users to connect different representations of chemical information.<n>We describe the system's architecture, key functionalities, and retrieval process, along with expert assessments of the system.
- Score: 26.94136747669151
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: We present a multimodal search tool that facilitates retrieval of chemical reactions, molecular structures, and associated text from scientific literature. Queries may combine molecular diagrams, textual descriptions, and reaction data, allowing users to connect different representations of chemical information. To support this, the indexing process includes chemical diagram extraction and parsing, extraction of reaction data from text in tabular form, and cross-modal linking of diagrams and their mentions in text. We describe the system's architecture, key functionalities, and retrieval process, along with expert assessments of the system. This demo highlights the workflow and technical components of the search system.
Related papers
- SubGrapher: Visual Fingerprinting of Chemical Structures [46.677062201188015]
SubGrapher is a method for the visual fingerprinting of chemical structure images.
Unlike conventional Optical Chemical Structure Recognition (OCSR) models that attempt to reconstruct full molecular graphs, SubGrapher focuses on extracting molecular fingerprints directly from chemical structure images.
Our approach is evaluated against state-of-the-art OCSR and fingerprinting methods, demonstrating superior retrieval performance and robustness across diverse molecular depictions.
arXiv Detail & Related papers (2025-04-28T11:45:46Z) - MarkushGrapher: Joint Visual and Textual Recognition of Markush Structures [47.41884299076947]
MarkushGrapher is a multi-modal approach for recognizing Markush structures in documents.
We propose a synthetic data generation pipeline that produces a wide range of realistic Markush structures.
M2S is the first annotated benchmark of real-world Markush structures.
arXiv Detail & Related papers (2025-03-20T12:40:38Z) - Learning Chemical Reaction Representation with Reactant-Product Alignment [50.28123475356234]
RAlign is a novel chemical reaction representation learning model for various organic reaction-related tasks.<n>By integrating atomic correspondence between reactants and products, our model discerns the molecular transformations that occur during the reaction.<n>We introduce a reaction-center-aware attention mechanism that enables the model to concentrate on key functional groups.
arXiv Detail & Related papers (2024-11-26T17:41:44Z) - ReactXT: Understanding Molecular "Reaction-ship" via Reaction-Contextualized Molecule-Text Pretraining [76.51346919370005]
We propose ReactXT for reaction-text modeling and OpenExp for experimental procedure prediction.
ReactXT features three types of input contexts to incrementally pretrain LMs.
Our code is available at https://github.com/syr-cn/ReactXT.
arXiv Detail & Related papers (2024-05-23T06:55:59Z) - OpenChemIE: An Information Extraction Toolkit For Chemistry Literature [37.23189665773341]
OpenChemIE is a tool for extracting reaction data from chemistry literature.
We employ specialized neural models that address a specific task for chemistry information extraction.
We meticulously annotate a challenging dataset of reaction schemes with R-groups to evaluate our pipeline as a whole.
arXiv Detail & Related papers (2024-04-01T20:16:21Z) - An Autonomous Large Language Model Agent for Chemical Literature Data
Mining [60.85177362167166]
We introduce an end-to-end AI agent framework capable of high-fidelity extraction from extensive chemical literature.
Our framework's efficacy is evaluated using accuracy, recall, and F1 score of reaction condition data.
arXiv Detail & Related papers (2024-02-20T13:21:46Z) - Predictive Chemistry Augmented with Text Retrieval [37.59545092901872]
We introduce TextReact, a novel method that directly augments predictive chemistry with texts retrieved from the literature.
TextReact retrieves text descriptions relevant for a given chemical reaction, and then aligns them with the molecular representation of the reaction.
We empirically validate the framework on two chemistry tasks: reaction condition recommendation and one-step retrosynthesis.
arXiv Detail & Related papers (2023-12-08T07:40:59Z) - MolGrapher: Graph-based Visual Recognition of Chemical Structures [50.13749978547401]
We introduce MolGrapher to recognize chemical structures visually.
We treat all candidate atoms and bonds as nodes and put them in a graph.
We classify atom and bond nodes in the graph with a Graph Neural Network.
arXiv Detail & Related papers (2023-08-23T16:16:11Z) - Multi-modal Molecule Structure-text Model for Text-based Retrieval and
Editing [107.49804059269212]
We present a multi-modal molecule structure-text model, MoleculeSTM, by jointly learning molecules' chemical structures and textual descriptions.
In experiments, MoleculeSTM obtains the state-of-the-art generalization ability to novel biochemical concepts.
arXiv Detail & Related papers (2022-12-21T06:18:31Z) - Structured information extraction from complex scientific text with
fine-tuned large language models [55.96705756327738]
We present a simple sequence-to-sequence approach to joint named entity recognition and relation extraction.
The approach leverages a pre-trained large language model (LLM), GPT-3, that is fine-tuned on approximately 500 pairs of prompts.
This approach represents a simple, accessible, and highly-flexible route to obtaining large databases of structured knowledge extracted from unstructured text.
arXiv Detail & Related papers (2022-12-10T07:51:52Z) - Fine-Grained Chemical Entity Typing with Multimodal Knowledge
Representation [36.6963949360594]
How to extract detailed knowledge about chemical reactions from the core chemistry literature is a new emerging challenge.
We propose a novel multi-modal representation learning framework to solve the problem of fine-grained chemical entity typing.
Experiment results show that the proposed framework outperforms multiple state-of-the-art methods.
arXiv Detail & Related papers (2021-08-29T19:41:35Z) - Named entity recognition in chemical patents using ensemble of
contextual language models [0.3731111830152912]
We study the effectiveness of contextualized language models to extract information from chemical patents.
Our best model, based on a majority ensemble approach, achieves an exact F1-score of 92.30% and a relaxed F1-score of 96.24%.
arXiv Detail & Related papers (2020-07-24T15:23:45Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.