Fugu-MT 論文翻訳(概要): ReactBench: A Benchmark for Topological Reasoning in MLLMs on Chemical Reaction Diagrams

論文の概要: ReactBench: A Benchmark for Topological Reasoning in MLLMs on Chemical Reaction Diagrams

arxiv url: http://arxiv.org/abs/2604.15994v2
Date: Thu, 23 Apr 2026 15:57:11 GMT
ステータス: 翻訳完了
システム内更新日: 2026-04-24 14:40:05.996447
Title: ReactBench: A Benchmark for Topological Reasoning in MLLMs on Chemical Reaction Diagrams
Title（参考訳）: ReactBench: 化学反応図によるMLLMのトポロジ的推論のベンチマーク
Authors: Qiang Xu, Shengyuan Bai, Yu Wang, He Cao, Leqing Chen, Yuanyuan Liu, Bin Feng, Zijing Liu, Yu Li,
Abstract要約: MLLM(Multimodal Large Language Models)は、個々の視覚的要素を認識し、単純な線形図形上の推論に優れる。化学反応図による構造的推論の基本的な制限を明らかにするベンチマークであるReactBenchを紹介する。我々のベンチマークは、4つの階層的なタスク次元にわたる1,618のエキスパートアノテーション付きQAペアで構成されている。
参考スコア（独自算出の注目度）: 21.265249070149842
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Multimodal Large Language Models (MLLMs) excel at recognizing individual visual elements and reasoning over simple linear diagrams. However, when faced with complex topological structures involving branching paths, converging flows, and cyclic dependencies, their reasoning capabilities degrade sharply, even on tasks as basic as counting endpoints. Existing benchmarks fail to probe this gap, focusing on semantic comprehension rather than structural reasoning. We introduce ReactBench, a benchmark that reveals fundamental limitations in structural reasoning through chemical reaction diagrams. These real-world scientific diagrams offer an ideal testbed because they naturally span diverse structures from linear chains to cyclic graphs, while requiring both precise local recognition and coherent global reasoning. Our benchmark comprises 1,618 expert-annotated QA pairs across four hierarchical task dimensions. Extensive evaluation across 17 MLLMs reveals a significant performance gap exceeding 30% between anchor-based tasks and holistic structural reasoning tasks. Controlled ablations confirm this bottleneck lies in reasoning, not perception. These findings expose a fundamental deficit in structural understanding and establish directions for advancing visual reasoning.
Abstract（参考訳）: MLLM(Multimodal Large Language Models)は、個々の視覚的要素を認識し、単純な線形図形上の推論に優れる。しかし、分岐経路、合流流、循環依存を含む複雑なトポロジ構造に直面した場合、それらの推論能力は、エンドポイントを数えるのと同じくらい基本的なタスクであっても、急激に低下する。既存のベンチマークでは、構造的推論よりも意味的理解に重点を置いて、このギャップを探索することができない。化学反応図による構造的推論の基本的な制限を明らかにするベンチマークであるReactBenchを紹介する。これらの実世界の科学図は、線形鎖から巡回グラフまで様々な構造に自然に分布し、正確な局所認識とコヒーレントな大域的推論の両方を必要とするため、理想的なテストベッドを提供する。我々のベンチマークは、4つの階層的なタスク次元にわたる1,618のエキスパートアノテーション付きQAペアで構成されている。 17個のMLLMにわたる広範囲な評価では、アンカーベースタスクと全体的構造的推論タスクの間に30%を超える大きなパフォーマンスギャップが示される。制御された説明は、このボトルネックが知覚ではなく推論にあることを裏付ける。これらの知見は、構造的理解の根本的な欠陥を明らかにし、視覚的推論を進めるための方向性を確立している。

論文の概要: ReactBench: A Benchmark for Topological Reasoning in MLLMs on Chemical Reaction Diagrams

関連論文リスト