Fugu-MT 論文翻訳(概要): Adaptive Learning via Off-Model Training and Importance Sampling for Fully Non-Markovian Optimal Stochastic Control. Complete version

論文の概要: Adaptive Learning via Off-Model Training and Importance Sampling for Fully Non-Markovian Optimal Stochastic Control. Complete version

arxiv url: http://arxiv.org/abs/2604.13147v1
Date: Tue, 14 Apr 2026 16:32:46 GMT
ステータス: 翻訳完了
システム内更新日: 2026-04-16 20:38:32.229423
Title: Adaptive Learning via Off-Model Training and Importance Sampling for Fully Non-Markovian Optimal Stochastic Control. Complete version
Title（参考訳）: 完全非マルコフ最適確率制御のためのオフモデルトレーニングと重要サンプリングによる適応学習
Authors: Dorival Leão, Alberto Ohashi, Simone Scotti, Adolfo M. D da Silva,
Abstract要約: 本稿では,制御状態が完全に非マルコフ的であり,未知のモデルパラメータに依存する連続時間制御問題について検討する。従来の研究で開発された離散スケルトン法に基づいて, 組込み後進動的プログラミング方程式に対するモンテカルロ学習手法を提案する。
参考スコア（独自算出の注目度）: 0.0
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: This paper studies continuous-time stochastic control problems whose controlled states are fully non-Markovian and depend on unknown model parameters. Such problems arise naturally in path-dependent stochastic differential equations, rough-volatility hedging, and systems driven by fractional Brownian motion. Building on the discrete skeleton approach developed in earlier work, we propose a Monte Carlo learning methodology for the associated embedded backward dynamic programming equation. Our main contribution is twofold. First, we construct explicit dominating training laws and Radon--Nikodym weights for several representative classes of non-Markovian controlled systems. This yields an off-model training architecture in which a fixed synthetic dataset is generated under a reference law, while the dynamic programming operators associated with a target model are recovered by importance sampling. Second, we use this structure to design an adaptive update mechanism under parametric model uncertainty, so that repeated recalibration can be performed by reweighting the same training sample rather than regenerating new trajectories. For fixed parameters, we establish non-asymptotic error bounds for the approximation of the embedded dynamic programming equation via deep neural networks. For adaptive learning, we derive quantitative estimates that separate Monte Carlo approximation error from model-risk error. Numerical experiments illustrate both the off-model training mechanism and the adaptive importance-sampling update in structured linear-quadratic examples.
Abstract（参考訳）: 本稿では,制御状態が完全に非マルコフ的であり,未知のモデルパラメータに依存する連続時間確率制御問題について検討する。このような問題は、経路依存確率微分方程式、粗揮発性ヘッジ、および分数的なブラウン運動によって駆動される系において自然に発生する。初期の研究で開発された離散スケルトン法に基づいて,関連する組込み動的プログラミング方程式に対するモンテカルロ学習手法を提案する。私たちの主な貢献は2倍です。まず、マルコフ制御系のいくつかの代表的なクラスに対して、明示的な支配的トレーニング則とラドン-ニコディム重みを構築する。これにより、固定された合成データセットを基準法の下で生成し、対象モデルに関連する動的プログラム演算子を重要サンプリングにより回収するオフモデルトレーニングアーキテクチャが得られる。第二に、この構造を用いてパラメトリックモデル不確実性の下で適応的な更新機構を設計し、新しい軌道を再生するのではなく、同じトレーニングサンプルを再重み付けすることで繰り返し再校正を行うことができる。固定パラメータに対しては、ディープニューラルネットワークによる埋め込み動的プログラミング方程式の近似のための非漸近誤差境界を確立する。適応学習では,モンテカルロ近似誤差とモデルリスク誤差を分離する定量的推定を導出する。数値実験は、非モデルトレーニング機構と適応的重要度サンプリング更新の両方を構造化線形四元数例で示す。

論文の概要: Adaptive Learning via Off-Model Training and Importance Sampling for Fully Non-Markovian Optimal Stochastic Control. Complete version

関連論文リスト