Fugu-MT 論文翻訳(概要): FeynmanBench: Benchmarking Multimodal LLMs on Diagrammatic Physics Reasoning

論文の概要: FeynmanBench: Benchmarking Multimodal LLMs on Diagrammatic Physics Reasoning

arxiv url: http://arxiv.org/abs/2604.03893v1
Date: Sat, 04 Apr 2026 23:18:58 GMT
ステータス: 翻訳完了
システム内更新日: 2026-04-07 15:49:18.821002
Title: FeynmanBench: Benchmarking Multimodal LLMs on Diagrammatic Physics Reasoning
Title（参考訳）: FeynmanBench: ダイアグラム物理推論におけるマルチモーダルLLMのベンチマーク
Authors: Zeyu Wang, Xiaogang Li, Peiyao Xiao, Qinhao Kong, Ben Wang, Chengliang Xu, Zichao Chen, Bing Zhao, Hu Wei,
Abstract要約: Feynman Benchは、Feynmanダイアグラムタスクを中心とした最初のベンチマークである。マルチステップ図式推論のためのAIの能力を評価するように設計されている。我々のデータベースは、標準モデルの電磁的、弱い、強い相互作用にまたがっている。
参考スコア（独自算出の注目度）: 11.098160996983417
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Breakthroughs in frontier theory often depend on the combination of concrete diagrammatic notations with rigorous logic. While multimodal large language models (MLLMs) show promise in general scientific tasks, current benchmarks often focus on local information extraction rather than the global structural logic inherent in formal scientific notations. In this work, we introduce FeynmanBench, the first benchmark centered on Feynman diagram tasks. It is designed to evaluate AI's capacity for multistep diagrammatic reasoning, which requires satisfying conservation laws and symmetry constraints, identifying graph topology, converting between diagrammatic and algebraic representations, and constructing scattering amplitudes under specific conventions and gauges. To support large-scale and reproducible evaluation, we developed an automated pipeline producing diverse Feynman diagrams along with verifiable topological annotations and amplitude results. Our database spans the electromagnetic, weak, and strong interactions of the Standard Model, encompasses over 100 distinct types and includes more than 2000 tasks. Experiments on state-of-the-art MLLMs reveal systematic failure modes, including unstable enforcement of physical constraints and violations of global topological conditions, highlighting the need for physics-grounded benchmarks for visual reasoning over scientific notation. FeynmanBench provides a logically rigorous test of whether AI can effectively engage in scientific discovery, particularly within theoretical physics.
Abstract（参考訳）: フロンティア理論におけるブレークスルーはしばしば、具体的な図式記法と厳密な論理の組み合わせに依存する。 MLLM(Multimodal large language model)は一般的な科学的タスクにおいて有望であるが、現在のベンチマークでは、形式的な科学的表記法に固有のグローバルな構造論理よりも、局所的な情報抽出に重点を置いていることが多い。本稿では、ファインマンダイアグラムタスクを中心とした最初のベンチマークであるFeynmanBenchを紹介する。グラフトポロジーの同定、図形表現と代数表現の変換、特定の規則とゲージの下で散乱振幅を構築することを必要とする多段階図形推論のためのAIの能力を評価するように設計されている。大規模かつ再現可能な評価を支援するため,様々なファインマン図を生成する自動パイプラインと,検証可能なトポロジアノテーションと振幅結果を開発した。我々のデータベースは、標準モデルの電磁的、弱い、強い相互作用にまたがっており、100以上の異なるタイプを含み、2000以上のタスクを含んでいる。最先端のMLLMの実験では、物理的制約の不安定な実施やグローバルなトポロジカルな条件の違反など、系統的な障害モードが示され、科学的な表記よりも視覚的推論のための物理式ベンチマークの必要性が強調された。 FeynmanBench氏は、特に理論物理学において、AIが科学的発見に効果的に関与できるかどうかを論理的に厳格に検証している。

論文の概要: FeynmanBench: Benchmarking Multimodal LLMs on Diagrammatic Physics Reasoning

関連論文リスト