Fugu-MT 論文翻訳(概要): Optimal policy evaluation using kernel-based temporal difference methods

論文の概要: Optimal policy evaluation using kernel-based temporal difference methods

arxiv url: http://arxiv.org/abs/2109.12002v1
Date: Fri, 24 Sep 2021 14:48:20 GMT
ステータス: 翻訳完了
システム内更新日: 2021-09-27 14:19:25.673213
Title: Optimal policy evaluation using kernel-based temporal difference methods
Title（参考訳）: カーネルに基づく時間差分法による最適政策評価
Authors: Yaqi Duan, Mengdi Wang, Martin J. Wainwright
Abstract要約: カーネルヒルベルト空間を用いて、無限水平割引マルコフ報酬過程の値関数を推定する。我々は、関連するカーネル演算子の固有値に明示的に依存した誤差の非漸近上界を導出する。 MRP のサブクラスに対する minimax の下位境界を証明する。
参考スコア（独自算出の注目度）: 78.83926562536791
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We study methods based on reproducing kernel Hilbert spaces for estimating the value function of an infinite-horizon discounted Markov reward process (MRP). We study a regularized form of the kernel least-squares temporal difference (LSTD) estimate; in the population limit of infinite data, it corresponds to the fixed point of a projected Bellman operator defined by the associated reproducing kernel Hilbert space. The estimator itself is obtained by computing the projected fixed point induced by a regularized version of the empirical operator; due to the underlying kernel structure, this reduces to solving a linear system involving kernel matrices. We analyze the error of this estimate in the $L^2(\mu)$-norm, where $\mu$ denotes the stationary distribution of the underlying Markov chain. Our analysis imposes no assumptions on the transition operator of the Markov chain, but rather only conditions on the reward function and population-level kernel LSTD solutions. We use empirical process theory techniques to derive a non-asymptotic upper bound on the error with explicit dependence on the eigenvalues of the associated kernel operator, as well as the instance-dependent variance of the Bellman residual error. In addition, we prove minimax lower bounds over sub-classes of MRPs, which shows that our rate is optimal in terms of the sample size $n$ and the effective horizon $H = (1 - \gamma)^{-1}$. Whereas existing worst-case theory predicts cubic scaling ($H^3$) in the effective horizon, our theory reveals that there is in fact a much wider range of scalings, depending on the kernel, the stationary distribution, and the variance of the Bellman residual error. Notably, it is only parametric and near-parametric problems that can ever achieve the worst-case cubic scaling.
Abstract（参考訳）: 無限水平割引マルコフ報酬過程(MRP)の値関数を推定するためのカーネルヒルベルト空間の再現に基づく手法を検討した。カーネル最小二乗時間差(LSTD)推定の正規化形式について検討し、無限データの集団制限において、関連する再生カーネルヒルベルト空間で定義される射影ベルマン作用素の固定点に対応する。推定器自身は、経験作用素の正規化バージョンによって誘導される射影された固定点を計算することで得られるが、基礎となるカーネル構造のため、これはカーネル行列を含む線形系を解くことに還元される。この推定の誤差を$L^2(\mu)$-normで解析し、$\mu$は基礎となるマルコフ連鎖の定常分布を表す。我々の解析はマルコフ連鎖の遷移作用素に仮定を課さず、むしろ報酬関数と集団レベルのカーネルLSTD解に関する条件のみを課している。我々は経験的プロセス理論手法を用いて、関連するカーネル演算子の固有値とベルマン残差誤差のインスタンス依存分散に明示的な依存性を持つ誤差の非漸近上界を導出する。さらに、MPPのサブクラスよりもミニマックスの低い境界を証明し、サンプルサイズ$n$と有効地平線$H = (1 - \gamma)^{-1}$の点で、我々のレートが最適であることを示す。既存の最悪のケース理論は、有効地平線における立方体スケーリング(H^3$)を予測するが、我々の理論は実際、カーネル、定常分布、ベルマン残差の分散に依存するより広い範囲のスケーリングが存在することを明らかにしている。特に、最悪の立方体スケーリングを達成できるのは、パラメトリックでほぼパラメトリックな問題のみである。

関連論文リスト

Near-Optimal Clustering in Mixture of Markov Chains [74.3828414695655]
我々は、長さ$H$の軌跡を、大きさ$S$の有限状態空間上の未知のエルゴードマルコフ鎖の1つによって生成される、$T$ trajectories of length $H$の問題を研究する。我々は、連鎖の遷移核間の重み付きKL分散によって支配されるクラスタリングエラー率に基づいて、インスタンス依存で高い確率の低い境界を導出する。次に,新しい2段階クラスタリングアルゴリズムを提案する。
論文参考訳（メタデータ） (2025-06-02T05:10:40Z)
Online Inference for Quantiles by Constant Learning-Rate Stochastic Gradient Descent [4.2694059987063655]
本稿では,理論的保証付き量子損失関数の学習速度を一定としたオンライン勾配勾配勾配推定法を提案する。数値解析により,提案した量子推定器の強い有限サンプル性能と推定法が示された。
論文参考訳（メタデータ） (2025-03-04T01:37:42Z)
Entrywise error bounds for low-rank approximations of kernel matrices [55.524284152242096]
切り抜き固有分解を用いて得られたカーネル行列の低ランク近似に対するエントリーワイド誤差境界を導出する。重要な技術的革新は、小さな固有値に対応するカーネル行列の固有ベクトルの非局在化結果である。我々は、合成および実世界のデータセットの集合に関する実証的研究により、我々の理論を検証した。
論文参考訳（メタデータ） (2024-05-23T12:26:25Z)
Learning with Norm Constrained, Over-parameterized, Two-layer Neural Networks [54.177130905659155]
近年の研究では、再生カーネルヒルベルト空間(RKHS)がニューラルネットワークによる関数のモデル化に適した空間ではないことが示されている。本稿では,有界ノルムを持つオーバーパラメータ化された2層ニューラルネットワークに適した関数空間について検討する。
論文参考訳（メタデータ） (2024-04-29T15:04:07Z)
A non-asymptotic theory of Kernel Ridge Regression: deterministic equivalents, test error, and GCV estimator [7.163829696591103]
カーネルリッジ回帰を用いた未知のターゲット関数$f_*$の学習を検討する。 KRRのテスト誤差に対する非漸近的決定論的近似を確立した。
論文参考訳（メタデータ） (2024-03-13T20:12:03Z)
Optimal minimax rate of learning interaction kernels [7.329333373512536]
広帯域の交換可能な分布に対して最適な収束率を得るための最小二乗推定器(tLSE)を導入する。以上の結果から, 大きな試料限界の逆問題が保たれた場合, 左テール確率はバイアス分散トレードオフを変化させないことがわかった。
論文参考訳（メタデータ） (2023-11-28T15:01:58Z)
Kernel-based off-policy estimation without overlap: Instance optimality beyond semiparametric efficiency [53.90687548731265]
本研究では,観測データに基づいて線形関数を推定するための最適手順について検討する。任意の凸および対称函数クラス $mathcalF$ に対して、平均二乗誤差で有界な非漸近局所ミニマックスを導出する。
論文参考訳（メタデータ） (2023-01-16T02:57:37Z)
Robust computation of optimal transport by $\beta$-potential regularization [79.24513412588745]
最適輸送(OT)は、確率分布間の差を測定する機械学習分野で広く使われているツールである。我々は、いわゆる$beta$-divergenceに付随するベータポテンシャル項でOTを正規化することを提案する。提案アルゴリズムで計算した輸送行列は,外乱が存在する場合でも確率分布を頑健に推定するのに役立つことを実験的に実証した。
論文参考訳（メタデータ） (2022-12-26T18:37:28Z)
Optimal variance-reduced stochastic approximation in Banach spaces [114.8734960258221]
可分バナッハ空間上で定義された収縮作用素の定点を推定する問題について検討する。演算子欠陥と推定誤差の両方に対して漸近的でない境界を確立する。
論文参考訳（メタデータ） (2022-01-21T02:46:57Z)
Nonparametric approximation of conditional expectation operators [0.3655021726150368]
最小の仮定の下で、$[Pf](x) := mathbbE[f(Y) mid X = x ]$ で定義される$L2$-operatorの近似について検討する。我々は、再生されたカーネル空間上で作用するヒルベルト・シュミット作用素により、作用素ノルムにおいて$P$が任意に適切に近似できることを証明した。
論文参考訳（メタデータ） (2020-12-23T19:06:12Z)
Early stopping and polynomial smoothing in regression with reproducing kernels [2.0411082897313984]
再生カーネルヒルベルト空間(RKHS)における反復学習アルゴリズムの早期停止問題について検討する。本稿では,いわゆる最小不一致原理に基づく検証セットを使わずに早期停止を行うデータ駆動型ルールを提案する。提案したルールは、異なるタイプのカーネル空間に対して、ミニマックス最適であることが証明されている。
論文参考訳（メタデータ） (2020-07-14T05:27:18Z)
Tight Nonparametric Convergence Rates for Stochastic Gradient Descent under the Noiseless Linear Model [0.0]
このモデルに基づく最小二乗リスクに対する1パス, 固定段差勾配勾配の収束度を解析した。特殊な場合として、ランダムなサンプリング点における値のノイズのない観測から単位区間上の実関数を推定するオンラインアルゴリズムを解析する。
論文参考訳（メタデータ） (2020-06-15T08:25:50Z)
On Linear Stochastic Approximation: Fine-grained Polyak-Ruppert and Non-Asymptotic Concentration [115.1954841020189]
The inequality and non-asymptotic properties of approximation procedure with Polyak-Ruppert averaging。一定のステップサイズと無限大となる反復数を持つ平均的反復数に対する中心極限定理(CLT)を証明する。
論文参考訳（メタデータ） (2020-04-09T17:54:18Z)

関連論文リストは本サイト内にある論文のタイトル・アブストラクトから自動的に作成しています。

指定された論文の情報です。
本サイトの運営者は本サイト（すべての情報・翻訳含む）の品質を保証せず、本サイト（すべての情報・翻訳含む）を使用して発生したあらゆる結果について一切の責任を負いません。