Fugu-MT 論文翻訳(概要): Mind the Sim-to-Real Gap & Think Like a Scientist

論文の概要: Mind the Sim-to-Real Gap & Think Like a Scientist

arxiv url: http://arxiv.org/abs/2605.21458v1
Date: Wed, 20 May 2026 17:48:14 GMT
ステータス: 翻訳完了
システム内更新日: 2026-05-21 19:19:56.823081
Title: Mind the Sim-to-Real Gap & Think Like a Scientist
Title（参考訳）: シン・トゥ・リアルのギャップを思い浮かべて科学者のように考える
Authors: Harsh Parikh, Gabriel Levin-Konigsberg, Dominique Perrault-Joncas, Alexander Volfovsky,
Abstract要約: 我々は,シミュレータを実験で補うべき時期と方法について検討する。我々はシミュレーション支援実験政策であるFisher-SEPを提案する。
参考スコア（独自算出の注目度）: 44.54570296032634
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Suppose a planner has a pre-trained simulator of a sequential decision problem and the option to run real experiments in the field. The simulator is cheap to query but inherits confounding and drift from its calibration data. Experimentation is unbiased but consumes one real unit per trial. We study when, and how, the planner should supplement the simulator with experiments. We give three results. First, an extended simulation lemma decomposes the simulator's value error into a calibration--deployment shift that randomization can identify and a parametric residual that no further interaction can reduce. Second, the value gap between the simulator-optimal policy and the optimum splits into a local component, on states the deployed policy already visits, and a reachability component, on states it does not. The reachability component stays bounded away from zero at any horizon under purely passive learning. Third, we propose Fisher-SEP, a simulation-aided experimental policy (SEP) that minimizes the posterior predictive variance of a target policy's value, with reward-only and transition-only specializations. Two case studies illustrate the regimes. In a vending-machine supply chain, front-loaded experimentation overtakes posterior updating once the horizon is long enough to amortize the pilot. In an HIV mobile-testing example with a corridor that separates a well-surveilled region from a poorly-surveilled one, only designed exploration reaches the poorly-surveilled region.
Abstract（参考訳）: 計画立案者は、逐次決定問題の事前訓練されたシミュレータと、実実験を現場で実行するオプションを有すると仮定する。シミュレーターはクェリが安いが、キャリブレーションデータからコンバウンディングとドリフトを継承する。実験は偏りがないが、1回の試行で1単位を消費する。我々は,シミュレータを実験で補うべき時期と方法について検討する。 3つの結果が得られます。まず、拡張されたシミュレーション補題は、シミュレータの値誤差を、ランダム化が識別できるキャリブレーション-デプロイシフトと、それ以上の相互作用を減少できないパラメトリック残差に分解する。第二に、シミュレータ-最適ポリシーと最適ポリシーの間の値ギャップは、デプロイ済みポリシーがすでに訪問している状態のローカルコンポーネントと、そうでない状態のリーチビリティコンポーネントに分割される。リーチビリティコンポーネントは、純粋に受動的学習の下で、どんな地平線でもゼロから遠ざかっている。第3に、シミュレーション支援実験政策(SEP)であるFisher-SEPを提案する。 2つのケーススタディは、体制を説明する。自動販売機サプライチェーンでは、水平線がパイロットを苦しめるのに十分な長さであれば、前装実験は後装更新に乗じる。 HIVのモバイルテストの例では、十分に調査された地域と調査が不十分な地域を分ける廊下があり、調査が不十分な地域に到達するようにデザインされているだけである。

論文の概要: Mind the Sim-to-Real Gap & Think Like a Scientist

関連論文リスト