Fugu-MT 論文翻訳(概要): Sparse probes and murky physics: a case study of interpretability challenges in a foundation model for continuum dynamics

論文の概要: Sparse probes and murky physics: a case study of interpretability challenges in a foundation model for continuum dynamics

arxiv url: http://arxiv.org/abs/2606.11657v1
Date: Wed, 10 Jun 2026 04:38:45 GMT
ステータス: 翻訳完了
システム内更新日: 2026-06-11 16:42:38.2976
Title: Sparse probes and murky physics: a case study of interpretability challenges in a foundation model for continuum dynamics
Title（参考訳）: スパースプローブと濁った物理--連続体力学の基礎モデルにおける解釈可能性課題のケーススタディ
Authors: Katherine Rosenfeld, Maike Sonnewald,
Abstract要約: 生成的AIエミュレータは、私たちがすでに強い理論、ベンチマーク、物理的直観を持っている科学領域でますます使われています。これにより、ファンデーションスタイルのモデルが既知の連続体力学を再現できるという、中心的な評価と解釈可能性の問題が提起される。物理原理で導かれる機械的解釈可能性を用いて, 連続体力学のクロスドメイン基盤モデル, ポリマト語によるWalrusについて検討する。
参考スコア（独自算出の注目度）: 0.2578242050187029
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Generative AI emulators are increasingly used in scientific domains where we already have strong theory, benchmarks, and physical intuition. This raises a central evaluation and interpretability question: when a foundation-style model can reproduce known continuum dynamics, what internal mechanism supports that behavior, is the internal behaviour consistent with known physics, and how does it relate to where the emulator succeeds or fails? We investigate a cross-domain foundation model for continuum dynamics, Walrus by Polymathic, using mechanistic interpretability guided by physical principles. We apply a sparse autoencoder (SAE) to probe a selected layer, and address the practical challenge of triaging a large feature set (over 20,000) using enstrophy as a physically grounded metric. As a deliberately simple testbed, we focus on shear flow and compare feature recruitment across multiple shear-flow setups, i.e. parameter values in the numerical simulation. Across setups we find evidence of piecewise consistency, with subsets of features recurring in similar roles, but this structure is intermittent and does not map cleanly onto standard physical decompositions. In parallel, direct comparisons between numerical simulation and the emulator reveal systematic output-level discrepancies, including regimes where energy/structures become too diffuse or too localized. We connect parts of these discrepancies to changes in specific SAE feature usage. Our work highlights open questions for scientific foundation models: how to robustly prioritize mechanistically meaningful features, how to separate stable structure from analysis artifacts (including single-layer and SAE limitations), and how to use established benchmarks to decide when "different" internal representations are genuinely informative rather than merely effective.
Abstract（参考訳）: 生成的AIエミュレータは、私たちがすでに強い理論、ベンチマーク、物理的直観を持っている科学領域でますます使われています。これは、基礎的なモデルが既知の連続体力学を再現できる場合、その振る舞いをサポートする内部メカニズムは何か、内部の振る舞いは既知の物理学と一致しているか、エミュレータが成功するか失敗するかにどのように関係するのか、という、中心的な評価と解釈可能性の疑問を提起する。物理原理で導かれる機械的解釈可能性を用いて, 連続体力学のクロスドメイン基盤モデル, ポリマト語によるWalrusについて検討する。選択した層を探索するためにスパースオートエンコーダ(SAE)を適用し、エンストロフィーを物理基底計量として使う2万以上の大きな特徴セットをトリアージする現実的な課題に対処する。簡単なテストベッドとして, せん断流に着目し, 数値シミュレーションにおけるパラメータ値など, 複数のせん断流のセットアップにおける特徴的採用を比較した。セットアップ全体にわたって、同様の役割で繰り返される機能のサブセットを含む断片的な一貫性の証拠が見つかるが、この構造は断続的であり、標準的な物理的分解にきれいにマッピングされない。平行して、数値シミュレーションとエミュレータの直接比較は、エネルギー/構造があまりに拡散しすぎたり、局在しすぎたりといった、系統的な出力レベルの相違を明らかにする。これらの不一致の一部は、特定のSAE機能の使用状況の変化と結びついています。我々の研究は、機械的に意味のある特徴をしっかりと優先順位付けする方法、分析成果物(単層やSAEの制限を含む)から安定した構造を分離する方法、そして「異なる」内部表現が単に効果的ではなく真に有意義であるかどうかを決定するために確立されたベンチマークをどのように使うかという、科学基盤モデルのオープンな疑問を強調している。

論文の概要: Sparse probes and murky physics: a case study of interpretability challenges in a foundation model for continuum dynamics

関連論文リスト