Fugu-MT 論文翻訳(概要): Fine-tuning DeepSeek-OCR-2 for Molecular Structure Recognition

論文の概要: Fine-tuning DeepSeek-OCR-2 for Molecular Structure Recognition

arxiv url: http://arxiv.org/abs/2604.03476v1
Date: Fri, 03 Apr 2026 21:42:54 GMT
ステータス: 翻訳完了
システム内更新日: 2026-04-07 15:49:18.598349
Title: Fine-tuning DeepSeek-OCR-2 for Molecular Structure Recognition
Title（参考訳）: 分子構造認識のための微細チューニングDeepSeek-OCR-2
Authors: Haocheng Tang, Xingyu Dang, Junmei Wang,
Abstract要約: 我々は、このタスクを画像条件SMILES生成として定式化し、DeepSeek-OCR-2を分子光学認識に適用する。我々は,PubChemの合成レンダリングとUSPTO-MOLのリアルな特許画像を組み合わせた大規模コーパスでモデルを訓練する。我々のモデルであるMollSeek-OCRは、最高のパフォーマンスのイメージ・ツー・シーケンスモデルに匹敵する正確なマッチング精度を達成し、競争力を示す。
参考スコア（独自算出の注目度）: 1.957558771641347
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Optical Chemical Structure Recognition (OCSR) is critical for converting 2D molecular diagrams from printed literature into machine-readable formats. While Vision-Language Models have shown promise in end-to-end OCR tasks, their direct application to OCSR remains challenging, and direct full-parameter supervised fine-tuning often fails. In this work, we adapt DeepSeek-OCR-2 for molecular optical recognition by formulating the task as image-conditioned SMILES generation. To overcome training instabilities, we propose a two-stage progressive supervised fine-tuning strategy: starting with parameter-efficient LoRA and transitioning to selective full-parameter fine-tuning with split learning rates. We train our model on a large-scale corpus combining synthetic renderings from PubChem and realistic patent images from USPTO-MOL to improve coverage and robustness. Our fine-tuned model, MolSeek-OCR, demonstrates competitive capabilities, achieving exact matching accuracies comparable to the best-performing image-to-sequence model. However, it remains inferior to state-of-the-art image-to-graph modelS. Furthermore, we explore reinforcement-style post-training and data-curation-based refinement, finding that they fail to improve the strict sequence-level fidelity required for exact SMILES matching.
Abstract（参考訳）: 光化学構造認識(OCSR)は2次元分子図を印刷物から機械可読形式に変換する上で重要である。 Vision-Language Models はエンドツーエンドの OCR タスクにおいて有望であるが、OCSR への直接適用は依然として困難であり、フルパラメータ制御による微調整は失敗することが多い。本研究では,DeepSeek-OCR-2を画像条件SMILES生成として定式化して分子光学認識に適用する。トレーニングの不安定性を克服するために,パラメータ効率の高いLoRAから始まり,スプリット学習率で選択的なフルパラメータ微調整に移行する2段階のプログレッシブ教師による微調整戦略を提案する。我々は、PubChemの合成レンダリングとUSPTO-MOLのリアルな特許画像を組み合わせた大規模コーパスでモデルをトレーニングし、カバレッジとロバスト性を改善する。我々の微調整モデルであるMollSeek-OCRは、最高のパフォーマンスのイメージ・ツー・シーケンスモデルに匹敵する正確なマッチング精度を達成し、競争力を示す。しかし、現状のイメージ・トゥ・グラフモデルよりは劣っている。さらに、強化スタイルのポストトレーニングとデータキュレーションに基づく改善について検討し、SMILESマッチングに必要な厳密なシーケンスレベルの忠実度を向上できないことを発見した。

論文の概要: Fine-tuning DeepSeek-OCR-2 for Molecular Structure Recognition

関連論文リスト