Fugu-MT 論文翻訳(概要): Physics Is All You Need? A Case Study in Physicist-Supervised AI Development of Scientific Software

論文の概要: Physics Is All You Need? A Case Study in Physicist-Supervised AI Development of Scientific Software

arxiv url: http://arxiv.org/abs/2605.30353v1
Date: Thu, 28 May 2026 17:59:59 GMT
ステータス: 翻訳完了
システム内更新日: 2026-05-30 02:45:56.765371
Title: Physics Is All You Need? A Case Study in Physicist-Supervised AI Development of Scientific Software
Title（参考訳）: 物理がすべて必要か? 科学ソフトウェアにおける物理学者によるAI開発を事例として
Authors: Nhat-Minh Nguyen,
Abstract要約: 物理学者は、CLAX-PTを構築するために、12の作業日と57のセッションでAIコーディングエージェントを監督した。エージェントは、託宣試験を繰り返すことで、自律的に10を解決した。予防されたオラクル検出ができなかった3つは共通の性質を共有しており、症状の減少を根本原因の解決法として扱った。
参考スコア（独自算出の注目度）: 0.304585143845864
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Are AI agents tools, co-authors, or researchers? We present a quantified case study ($N=1$): a physicist supervising an AI coding agent (Claude Code, Sonnet and Opus models) over 12 work days and 57 sessions to build CLAX-PT, a differentiable one-loop perturbation theory module in JAX. We documented and classified 15 supervision events by intervention level. The agent resolved ten autonomously by iterating against oracle tests. Two more by the physicist's domain knowledge. The three it could not -- all evaded oracle detection -- share a common property: the agent treated symptom reduction as root-cause resolution. It spent 33 of the 57 sessions adjusting coefficients within a code architecture that could not represent the target physics, and could not re-evaluate its CLASS-PT branch choice even when prompted to reconsider; only an injected physics concept (anisotropic BAO damping) triggered the redesign. Separately, the agent committed a calibrated correction that passed all oracle tests but corresponded to no quantity in the theory, predicting wrong values at any other cosmology. The fudge factor was caught and replaced within the same session. Three supervision practices proved critical for catching what oracle tests missed: testing at diverse parameter points beyond the fiducial calibration; shared changelogs that surfaced stalled exploration across sessions; and an explicit rule against unphysical numerical patches. In this case, supervision design, not model capability, determined whether the agent's output was trustworthy. Closing the gap would require agents that propose architectural alternatives rather than optimize within a given structure, and distinguish predictive adequacy from explanatory correctness -- capabilities not exhibited here, not obviously addressed by scaling alone. [Abridged.]
Abstract（参考訳）: AIエージェントはツールか、共著者か、研究者か? 我々は、12日間にわたってAIコーディングエージェント(Claude Code、Sonnet、Opusモデル)を監督し、57セッションにわたってCLAX-PTを構築する物理学者(N=1$): JAXで微分可能な1ループ摂動理論モジュール(CLAX-PT)を構築する。我々は介入レベルによって15の監視イベントを文書化し分類した。エージェントは、託宣試験を繰り返すことで、自律的に10を解決した。専門は物理学者の専門知識。予防されたオラクル検出ができなかった3つは共通の性質を共有しており、症状の減少を根本原因の解決法として扱った。 57セッションのうち33セッションは、目標とする物理を表現できず、再考を促してもCLASS-PTの分岐選択を再評価することができなかったコードアーキテクチャ内の係数を調整するのに費やされた。別々に、エージェントは校正された修正を行い、全てのオラクルテストに合格したが、理論の量と一致せず、他の宇宙論で間違った値を予測した。フェッジファクタは、同じセッションでキャッチされ、置き換えられた。 3つの監督慣行は、オラクルテストが見逃したことをキャッチするために重要であった: フィデューシャルキャリブレーションを超えて多様なパラメータポイントでテストすること、セッション間の探索が行き詰まりを表面化した共有チェンジログ、非物理的数値パッチに対する明確なルール。この場合、モデル能力ではなく監督設計は、エージェントの出力が信頼できるかどうかを判断した。ギャップを埋めるには、特定の構造内で最適化するのではなく、アーキテクチャ上の代替案を提案するエージェントが必要であり、予測的妥当性と説明的正しさを区別する必要がある。【橋渡し】

論文の概要: Physics Is All You Need? A Case Study in Physicist-Supervised AI Development of Scientific Software

関連論文リスト