Fugu-MT 論文翻訳(概要): Grading the Unspoken: Evaluating Tacit Reasoning in Quantum Field Theory and String Theory with LLMs

論文の概要: Grading the Unspoken: Evaluating Tacit Reasoning in Quantum Field Theory and String Theory with LLMs

arxiv url: http://arxiv.org/abs/2604.14188v1
Date: Wed, 01 Apr 2026 02:03:15 GMT
ステータス: 翻訳完了
システム内更新日: 2026-04-19 19:09:11.708753
Title: Grading the Unspoken: Evaluating Tacit Reasoning in Quantum Field Theory and String Theory with LLMs
Title（参考訳）: 量子場論における暗黙的推論の評価とLLMを用いた弦理論
Authors: Xingyang Yu, Yinghuan Zhang, Yufei Zhang, Zijun Cui,
Abstract要約: 大規模な言語モデルは、数学と物理学の多くの領域で顕著なパフォーマンスを示している。 1つの自然な疑問は、そのようなモデルが量子場理論や弦理論のような高度に抽象的な理論分野の研究を支援することができるかどうかである。我々は、量子場理論と弦理論の中核領域にまたがる12の質問からなる、コンパクトな専門家計算データセットを構築した。
参考スコア（独自算出の注目度）: 6.723992068753028
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Large language models have demonstrated impressive performance across many domains of mathematics and physics. One natural question is whether such models can support research in highly abstract theoretical fields such as quantum field theory and string theory. Evaluating this possibility faces an immediate challenge: correctness in these domains is layered, tacit, and fundamentally non-binary. Standard answer-matching metrics fail to capture whether intermediate conceptual steps are properly reconstructed or whether implicit structural constraints are respected. We construct a compact expert-curated dataset of twelve questions spanning core areas of quantum field theory and string theory, and introduce a five-level grading rubric separating statement correctness, key concept awareness, reasoning chain presence, tacit step reconstruction, and enrichment. Evaluating multiple contemporary LLMs, we observe near-ceiling performance on explicit derivations within stable conceptual frames, but systematic degradation when tasks require reconstruction of omitted reasoning steps or reorganization of representations under global consistency constraints. These failures are driven not only by missing intermediate steps, but by an instability in representation selection: models often fail to identify the correct conceptual framing required to resolve implicit tensions. We argue that highly abstract theoretical physics provides a uniquely sensitive lens on the epistemic limits of current evaluation paradigms.
Abstract（参考訳）: 大規模な言語モデルは、数学と物理学の多くの領域で顕著なパフォーマンスを示している。 1つの自然な疑問は、そのようなモデルが量子場理論や弦理論のような高度に抽象的な理論分野の研究を支援することができるかどうかである。これらのドメインの正しさは階層化され、暗黙的であり、基本的には非バイナリである。標準回答マッチングメトリクスは、中間概念ステップが適切に再構成されているか、暗黙的な構造的制約が尊重されているかを把握することができない。我々は、量子場理論と弦理論のコア領域にまたがる12の質問からなる、コンパクトな専門家計算データセットを構築し、文の正当性、キーコンセプトの認識、推論チェーンの存在、暗黙のステップ再構成、エンリッチメントを5段階のグレーディングルーブリックに導入する。複数の現代LLMを評価することで、安定な概念的フレーム内での明示的導出の概焼性能を観察するが、タスクが省略された推論ステップの再構築や、グローバルな一貫性制約下での表現の再編成を必要とする場合の体系的劣化を観察する。これらの失敗は、中間段階の欠如だけでなく、表現選択の不安定性によっても引き起こされる:モデルはしばしば暗黙の緊張を解消するために必要な正しい概念的フレーミングを特定するのに失敗する。我々は、高度に抽象的な理論物理学が、現在の評価パラダイムのエピステミック限界に対して、一意に敏感なレンズを提供すると主張する。

論文の概要: Grading the Unspoken: Evaluating Tacit Reasoning in Quantum Field Theory and String Theory with LLMs

関連論文リスト