Fugu-MT 論文翻訳(概要): Molecular Identifier Visual Prompt and Verifiable Reinforcement Learning for Chemical Reaction Diagram Parsing

論文の概要: Molecular Identifier Visual Prompt and Verifiable Reinforcement Learning for Chemical Reaction Diagram Parsing

arxiv url: http://arxiv.org/abs/2603.15011v2
Date: Tue, 17 Mar 2026 06:44:32 GMT
ステータス: 翻訳完了
システム内更新日: 2026-03-18 13:19:43.955931
Title: Molecular Identifier Visual Prompt and Verifiable Reinforcement Learning for Chemical Reaction Diagram Parsing
Title（参考訳）: 化学反応図解析のための分子識別器の視覚プロンプトと検証可能な強化学習
Authors: Jiahe Song, Chuang Wang, Yinfan Wang, Hao Zheng, Rui Nie, Bowen Jiang, Xingjian Wei, Junyuan Gao, Yubin Wang, Bin Wang, Lijun Wu, Jiang Wu, Qian Yu, Conghui He,
Abstract要約: 反応図解析(RxnDP)は、文献から化学合成情報を抽出するために重要である。近年の視覚言語モデル(VLM)はこの複雑な視覚的推論タスクを自動化するための有望なパラダイムとして登場した。この研究はVLMベースのRxnDPを2つの相補的視点、すなわち表現の促進と学習パラダイムから強化する。
参考スコア（独自算出の注目度）: 52.825281124618535
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Reaction diagram parsing (RxnDP) is critical for extracting chemical synthesis information from literature. Although recent Vision-Language Models (VLMs) have emerged as a promising paradigm to automate this complex visual reasoning task, their application is fundamentally bottlenecked by the inability to align visual chemical entities with pre-trained knowledge, alongside the inherent discrepancy between token-level training and reaction-level evaluation. To address these dual challenges, this work enhances VLM-based RxnDP from two complementary perspectives: prompting representation and learning paradigms. First, we propose Identifier as Visual Prompting (IdtVP), which leverages naturally occurring molecule identifiers (e.g., bold numerals like 1a) to activate the chemical knowledge acquired during VLM pre-training. IdtVP enables powerful zero-shot and out-of-distribution capabilities, outperforming existing prompting strategies. Second, to further optimize performance within fine-tuning paradigms, we introduce Re3-DAPO, a reinforcement learning algorithm that leverages verifiable rewards to directly optimize reaction-level metrics, thereby achieving consistent gains over standard supervised fine-tuning. Additionally, we release the ScannedRxn benchmark, comprising scanned historical reaction diagrams with real-world artifacts, to rigorously assess model robustness and out-of-distribution ability. Our contributions advance the accuracy and generalization of VLM-based reaction diagram parsing. We will release data, models, and code on GitHub.
Abstract（参考訳）: 反応図解析(RxnDP)は、文献から化学合成情報を抽出するために重要である。近年のビジョン・ランゲージ・モデル(VLM)は、この複雑な視覚的推論タスクを自動化するための有望なパラダイムとして登場したが、それらの応用は、トークンレベルのトレーニングと反応レベルの評価の固有の相違とともに、視覚化学的実体を事前訓練された知識と整合する能力の欠如によって、根本的にボトルネックになっている。この2つの課題に対処するために、この研究はVLMベースのRxnDPを2つの相補的な視点から拡張する。まず、自然に生じる分子識別子(例えば、1aのような大胆な数字)を活用して、VLM事前学習中に得られる化学知識を活性化する視覚プロンプト(IdtVP)として同定器を提案する。 IdtVPは強力なゼロショットとアウト・オブ・ディストリビューション機能を実現し、既存のプロンプト戦略を上回っている。第二に、微調整パラダイムにおけるパフォーマンスをさらに最適化するために、検証可能な報酬を利用して反応レベルのメトリクスを直接最適化する強化学習アルゴリズムRe3-DAPOを導入する。さらに,ScannedRxnベンチマークを公開し,実世界の成果物を用いた歴史的反応図を作成し,モデルロバスト性とアウト・オブ・ディストリビューション能力の厳密な評価を行った。我々の貢献は、VLMに基づく反応図解析の精度と一般化を推し進める。データ、モデル、コードをGitHubでリリースします。

論文の概要: Molecular Identifier Visual Prompt and Verifiable Reinforcement Learning for Chemical Reaction Diagram Parsing

関連論文リスト