MolSight: Optical Chemical Structure Recognition with SMILES Pretraining, Multi-Granularity Learning and Reinforcement Learning
- URL: http://arxiv.org/abs/2511.17300v1
- Date: Fri, 21 Nov 2025 15:11:47 GMT
- Title: MolSight: Optical Chemical Structure Recognition with SMILES Pretraining, Multi-Granularity Learning and Reinforcement Learning
- Authors: Wenrui Zhang, Xinggang Wang, Bin Feng, Wenyu Liu,
- Abstract summary: MolSight is a comprehensive learning framework that employs a three-stage training paradigm.<n>We show that MolSight achieves state-of-the-art performance in (stereo)chemical optical structure recognition.
- Score: 47.029225594084345
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Optical Chemical Structure Recognition (OCSR) plays a pivotal role in modern chemical informatics, enabling the automated conversion of chemical structure images from scientific literature, patents, and educational materials into machine-readable molecular representations. This capability is essential for large-scale chemical data mining, drug discovery pipelines, and Large Language Model (LLM) applications in related domains. However, existing OCSR systems face significant challenges in accurately recognizing stereochemical information due to the subtle visual cues that distinguish stereoisomers, such as wedge and dash bonds, ring conformations, and spatial arrangements. To address these challenges, we propose MolSight, a comprehensive learning framework for OCSR that employs a three-stage training paradigm. In the first stage, we conduct pre-training on large-scale but noisy datasets to endow the model with fundamental perception capabilities for chemical structure images. In the second stage, we perform multi-granularity fine-tuning using datasets with richer supervisory signals, systematically exploring how auxiliary tasks-specifically chemical bond classification and atom localization-contribute to molecular formula recognition. Finally, we employ reinforcement learning for post-training optimization and introduce a novel stereochemical structure dataset. Remarkably, we find that even with MolSight's relatively compact parameter size, the Group Relative Policy Optimization (GRPO) algorithm can further enhance the model's performance on stereomolecular. Through extensive experiments across diverse datasets, our results demonstrate that MolSight achieves state-of-the-art performance in (stereo)chemical optical structure recognition.
Related papers
- Agentic reinforcement learning empowers next-generation chemical language models for molecular design and synthesis [51.83339196548892]
ChemCraft is a novel framework that decouples chemical reasoning from knowledge storage.<n>ChemCraft achieves superior performance with minimal inference costs.<n>This work establishes a cost-effective and privacy-preserving paradigm for AI-aided chemistry.
arXiv Detail & Related papers (2026-01-25T04:23:34Z) - Unveiling Latent Knowledge in Chemistry Language Models through Sparse Autoencoders [42.033443425253644]
We extend sparse autoencoder techniques to uncover and examine interpretable features within chemistry language models.<n>Our findings reveal that these models encode a rich landscape of chemical concepts.<n>Our approach provides a generalisable framework for uncovering latent knowledge in chemistry-focused AI systems.
arXiv Detail & Related papers (2025-12-08T22:20:01Z) - Foundation Models for Discovery and Exploration in Chemical Space [57.97784111110166]
MIST is a family of molecular foundation models trained on large unlabeled datasets.<n>We demonstrate the ability of these models to solve real-world problems across chemical space.
arXiv Detail & Related papers (2025-10-20T17:56:01Z) - ChemOrch: Empowering LLMs with Chemical Intelligence via Synthetic Instructions [52.79349601462865]
ChemOrch is a framework that synthesizes chemically grounded instruction-response pairs.<n>ChemOrch enables controllable diversity and levels of difficulty for the generated tasks.
arXiv Detail & Related papers (2025-09-20T05:43:58Z) - $\text{M}^{2}$LLM: Multi-view Molecular Representation Learning with Large Language Models [59.125833618091846]
We propose a multi-view framework that integrates three perspectives: the molecular structure view, the molecular task view, and the molecular rules view.<n>Experiments demonstrate that $textM2$LLM achieves state-of-the-art performance on multiple benchmarks across classification and regression tasks.
arXiv Detail & Related papers (2025-08-12T05:46:47Z) - A Multi-Agent System Enables Versatile Information Extraction from the Chemical Literature [8.306442315850878]
We develop a multimodal large language model (MLLM)-based multi-agent system for robust and automated chemical information extraction.<n>Our system achieved an F1 score of 80.8% on a benchmark dataset of sophisticated multimodal chemical reaction graphics from the literature.
arXiv Detail & Related papers (2025-07-27T11:16:57Z) - Causal integration of chemical structures improves representations of microscopy images for morphological profiling [25.027684911103897]
We introduce a representation learning framework, MICON, that models chemical compounds as treatments that induce counterfactual transformations of cell phenotypes.<n>We demonstrate that incorporating chemical compound information into the learning process provides consistent improvements in our evaluation setting.<n>Our findings point to a new direction for representation learning in morphological profiling, suggesting that methods should explicitly account for the multimodal nature of microscopy screening data.
arXiv Detail & Related papers (2025-04-13T12:27:21Z) - DiffMS: Diffusion Generation of Molecules Conditioned on Mass Spectra [60.39311767532607]
We present DiffMS, a formula-restricted encoder-decoder generative network that achieves state-of-the-art performance on this task.<n>To develop a robust decoder that bridges latent embeddings and molecular structures, we pretrain the diffusion decoder with fingerprint-structure pairs.<n>Experiments on established benchmarks show that DiffMS outperforms existing models on de novo molecule generation.
arXiv Detail & Related papers (2025-02-13T18:29:48Z) - MolParser: End-to-end Visual Recognition of Molecule Structures in the Wild [17.846545370594452]
We present Mol, a novel end-to-end optical chemical structure recognition method.<n>We use a SMILES encoding rule to annotate Mol-7M, the largest annotated molecular image dataset.<n>We trained an end-to-end molecular image captioning model, Mol, using a curriculum learning approach.
arXiv Detail & Related papers (2024-11-17T15:00:09Z) - Pre-trained Molecular Language Models with Random Functional Group Masking [54.900360309677794]
We propose a SMILES-based underlineem Molecular underlineem Language underlineem Model, which randomly masking SMILES subsequences corresponding to specific molecular atoms.
This technique aims to compel the model to better infer molecular structures and properties, thus enhancing its predictive capabilities.
arXiv Detail & Related papers (2024-11-03T01:56:15Z) - Leveraging Chemistry Foundation Models to Facilitate Structure Focused Retrieval Augmented Generation in Multi-Agent Workflows for Catalyst and Materials Design [0.0]
We show that chemistry foundation models can serve as a basis for enabling structure-focused, semantic chemistry information retrieval.<n>We also show the use of chemistry foundation models in conjunction with multi-modal models such as OpenCLIP.
arXiv Detail & Related papers (2024-08-21T17:25:45Z) - Atom-Level Optical Chemical Structure Recognition with Limited Supervision [14.487346160322653]
We propose a new chemical structure recognition tool that delivers state-of-the-art performance.
Unlike previous approaches, our method provides atom-level localization.
Our model is the first model to perform OCSR with atom-level entity detection with only SMILES supervision.
arXiv Detail & Related papers (2024-04-02T09:01:21Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.