Fugu-MT 論文翻訳(概要): CausaLab: A Scalable Environment for Interactive Causal Discovery Toward AI Scientists

論文の概要: CausaLab: A Scalable Environment for Interactive Causal Discovery Toward AI Scientists

arxiv url: http://arxiv.org/abs/2605.26029v2
Date: Thu, 28 May 2026 01:38:42 GMT
ステータス: 翻訳完了
システム内更新日: 2026-05-30 02:45:54.646814
Title: CausaLab: A Scalable Environment for Interactive Causal Discovery Toward AI Scientists
Title（参考訳）: CausaLab:AI科学者を対象としたインタラクティブな因果発見のためのスケーラブルな環境
Authors: Junlin Yang, Dylan Zhang, Xiangchen Song, Qirun Dai, Xiao Liu, Yuen Chen, Aniket Vashishtha, Jing Shi, Chenhao Tan, Hao Peng,
Abstract要約: LLMエージェントによる対話的因果発見を評価するスケーラブルな環境であるCausaLabを紹介する。以前の評価とは異なり、CausaLabは、エージェントが因果的証拠を用いて問題を解くことができるかどうか、そしてその答えが忠実に回復された因果的メカニズムに根ざされているかどうかを評価している。
参考スコア（独自算出の注目度）: 28.253879252786632
License: http://creativecommons.org/licenses/by/4.0/
Abstract: We introduce CausaLab, a scalable environment for evaluating interactive causal discovery by LLM agents. Unlike prior evaluations, CausaLab evaluates both whether an agent can solve a problem using causal evidence and whether its answer is grounded in a faithful recovered causal mechanism. Each episode places an agent in a synthetic laboratory: it receives prior measurement records, intervenes on a manipulator crystal, and predicts the resonance frequency of a held-out reactor crystal governed by the same mechanism. The hidden data-generating process is a randomly sampled structural causal model (SCM), so success requires recovering both a causal graph and structural equations rather than recalling prior knowledge. Experiments show a persistent gap between prediction and mechanism recovery: in the purely observational 6-node setting, GPT-5.2-high reaches 92% task accuracy but only 0.471 all-edge $F_1$. Mixed observation-intervention strategies improve structural fidelity, while pure intervention remains difficult even for strong agents. We identify premature stopping as a major weakness and show that consistency verification mitigates it. CausaLab therefore separates predictive success from causal understanding and exposes current LLM agents' limits as experimental causal reasoners.
Abstract（参考訳）: LLMエージェントによる対話的因果発見を評価するスケーラブルな環境であるCausaLabを紹介する。以前の評価とは異なり、CausaLabは、エージェントが因果的証拠を用いて問題を解くことができるかどうか、そしてその答えが忠実に回復された因果的メカニズムに根ざされているかどうかを評価している。各エピソードは、前回の測定記録を受け取り、マニピュレータ結晶に介入し、同じ機構で制御される保留型原子炉結晶の共鳴周波数を予測する。隠れたデータ生成プロセスはランダムにサンプリングされた構造因果モデル(SCM)であるため、成功には事前の知識を思い出すのではなく、因果グラフと構造方程式の両方を復元する必要がある。純粋に観測可能な6ノード設定では、GPT-5.2ハイは92%のタスク精度を持つが、全エッジの$F_1$はわずか0.471である。混合観察・干渉戦略は構造的忠実性を改善する一方、強いエージェントでさえ純粋な介入は困難である。我々は、早期停止を大きな弱点として認識し、一貫性検証がそれを緩和することを示す。 CausaLabは因果的理解から予測的成功を分離し、実験的な因果的推論として現在のLLMエージェントの限界を公開する。

論文の概要: CausaLab: A Scalable Environment for Interactive Causal Discovery Toward AI Scientists

関連論文リスト