Fugu-MT 論文翻訳(概要): Infinite-Horizon Offline Reinforcement Learning with Linear Function Approximation: Curse of Dimensionality and Algorithm

論文の概要: Infinite-Horizon Offline Reinforcement Learning with Linear Function Approximation: Curse of Dimensionality and Algorithm

arxiv url: http://arxiv.org/abs/2103.09847v1
Date: Wed, 17 Mar 2021 18:18:57 GMT
ステータス: 翻訳完了
システム内更新日: 2021-03-19 14:15:10.640193
Title: Infinite-Horizon Offline Reinforcement Learning with Linear Function Approximation: Curse of Dimensionality and Algorithm
Title（参考訳）: 線形関数近似を用いた無限ホリゾンオフライン強化学習:次元の呪いとアルゴリズム
Authors: Lin Chen, Bruno Scherrer, Peter L. Bartlett
Abstract要約: 本稿では,オフライン強化学習におけるポリシー評価のサンプル複雑さについて検討する。低分布シフトの仮定の下では、最大$oleft(maxleft fracleftvert thetapirightvert _24varepsilon4logfracddelta,frac1varepsilon2left(d+logfrac1deltaright)right right)$サンプルを必要とするアルゴリズムがあることを示す。
参考スコア（独自算出の注目度）: 46.36534144138337
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: In this paper, we investigate the sample complexity of policy evaluation in infinite-horizon offline reinforcement learning (also known as the off-policy evaluation problem) with linear function approximation. We identify a hard regime $d\gamma^{2}>1$, where $d$ is the dimension of the feature vector and $\gamma$ is the discount rate. In this regime, for any $q\in[\gamma^{2},1]$, we can construct a hard instance such that the smallest eigenvalue of its feature covariance matrix is $q/d$ and it requires $\Omega\left(\frac{d}{\gamma^{2}\left(q-\gamma^{2}\right)\varepsilon^{2}}\exp\left(\Theta\left(d\gamma^{2}\right)\right)\right)$ samples to approximate the value function up to an additive error $\varepsilon$. Note that the lower bound of the sample complexity is exponential in $d$. If $q=\gamma^{2}$, even infinite data cannot suffice. Under the low distribution shift assumption, we show that there is an algorithm that needs at most $O\left(\max\left\{ \frac{\left\Vert \theta^{\pi}\right\Vert _{2}^{4}}{\varepsilon^{4}}\log\frac{d}{\delta},\frac{1}{\varepsilon^{2}}\left(d+\log\frac{1}{\delta}\right)\right\} \right)$ samples ($\theta^{\pi}$ is the parameter of the policy in linear function approximation) and guarantees approximation to the value function up to an additive error of $\varepsilon$ with probability at least $1-\delta$.
Abstract（参考訳）: 本稿では,線形関数近似を用いて,無限ホリゾンオフライン強化学習(オフポリシー評価問題とも呼ばれる)におけるポリシー評価のサンプル複雑性について検討する。ハードレジーム $d\gamma^{2}>1$ を特定し、ここで$d$ は特徴ベクトルの次元、$\gamma$ はディスカウントレートである。この方法では、任意の$q\in[\gamma^{2},1]$に対して、その特徴共分散行列の最小の固有値が$q/d$で$\omega\left(\frac{d}{\gamma^{2}\left(q-\gamma^{2}\right)\varepsilon^{2}}\exp\left(\theta\left(d\gamma^{2}\right)\right)\right)\right)$が付加誤差$\varepsilon$までの値関数を近似するハードインスタンスを構築することができる。サンプルの複雑さの低い境界は$d$で指数関数的であることに注意。もし$q=\gamma^{2}$なら、無限のデータでも十分ではない。 Under the low distribution shift assumption, we show that there is an algorithm that needs at most $O\left(\max\left\{ \frac{\left\Vert \theta^{\pi}\right\Vert _{2}^{4}}{\varepsilon^{4}}\log\frac{d}{\delta},\frac{1}{\varepsilon^{2}}\left(d+\log\frac{1}{\delta}\right)\right\} \right)$ samples ($\theta^{\pi}$ is the parameter of the policy in linear function approximation) and guarantees approximation to the value function up to an additive error of $\varepsilon$ with probability at least $1-\delta$.

関連論文リスト

Iterative thresholding for non-linear learning in the strong $\varepsilon$-contamination model [3.309767076331365]
閾値降下を用いた単一ニューロンモデル学習のための近似境界を導出する。線形回帰問題も研究し、$sigma(mathbfx) = mathbfx$ となる。
論文参考訳（メタデータ） (2024-09-05T16:59:56Z)
Near-Optimal Distributed Minimax Optimization under the Second-Order Similarity [22.615156512223763]
本研究では,有限サム構造を目的とする分散楽観的すべり(SVOGS)法を提案する。我々は$mathcal O(delta D2/varepsilon)$、$mathcal O(n+sqrtndelta D2/varepsilon)$、$tildemathcal O(n+sqrtndelta+L)D2/varepsilon)$の局所呼び出しを証明している。
論文参考訳（メタデータ） (2024-05-25T08:34:49Z)
Estimation and Inference in Distributional Reinforcement Learning [28.253677740976197]
サイズ$widetilde Oleft(frac|mathcalS||mathcalA|epsilon2 (1-gamma)4right)$ suffices to ensure the Kolmogorov metric and total variation metric between $hatetapi$ and $etapi$ is below $epsilon$ with high probability。以上の結果から,多種多様な統計的汎関数の統計的推測への統一的アプローチがもたらされた。
論文参考訳（メタデータ） (2023-09-29T14:14:53Z)
Detection of Dense Subhypergraphs by Low-Degree Polynomials [72.4451045270967]
ランダムグラフにおける植込み高密度部分グラフの検出は、基本的な統計的および計算上の問題である。我々は、$Gr(n, n-beta)ハイパーグラフにおいて、植えた$Gr(ngamma, n-alpha)$ subhypergraphの存在を検出することを検討する。平均値の減少に基づく硬さが不明な微妙な対数密度構造を考えると,この結果はグラフの場合$r=2$で既に新しくなっている。
論文参考訳（メタデータ） (2023-04-17T10:38:08Z)
Krylov Methods are (nearly) Optimal for Low-Rank Approximation [8.017116107657206]
任意のアルゴリズムが$Omegaleft(log(n)/varepsilon1/2right)$ matrix-vector productを必要とし、Krylov法で得られる上限値と正確に一致することを示す。我々の下位境界はOpen Question 1WooWoo14で、Spectral LRAのアルゴリズムの進歩の欠如の証拠を提供している。
論文参考訳（メタデータ） (2023-04-06T16:15:19Z)
A spectral least-squares-type method for heavy-tailed corrupted regression with unknown covariance \& heterogeneous noise [2.019622939313173]
重み付き最小二乗線形回帰は、少なくとも$epsilon n$ arbitrary outliersの$n$のラベル特徴サンプルを破損させたと仮定して再検討する。本稿では,$(Sigma,Xi) や $Xi$ の演算ノルムに関する知識を前提に,電力法に基づくほぼ最適に計算可能な推定器を提案する。
論文参考訳（メタデータ） (2022-09-06T23:37:31Z)
Low-Rank Approximation with $1/\epsilon^{1/3}$ Matrix-Vector Products [58.05771390012827]
我々は、任意のSchatten-$p$ノルムの下で、低ランク近似のためのクリロフ部分空間に基づく反復法について研究する。我々の主な成果は、$tildeO(k/sqrtepsilon)$ matrix-vector productのみを使用するアルゴリズムである。
論文参考訳（メタデータ） (2022-02-10T16:10:41Z)
Optimal Mean Estimation without a Variance [103.26777953032537]
本研究では,データ生成分布の分散が存在しない環境での重み付き平均推定問題について検討する。最小の信頼区間を$n,d,delta$の関数として得る推定器を設計する。
論文参考訳（メタデータ） (2020-11-24T22:39:21Z)
Model-Free Reinforcement Learning: from Clipped Pseudo-Regret to Sample Complexity [59.34067736545355]
S$状態、$A$アクション、割引係数$gamma in (0,1)$、近似しきい値$epsilon > 0$の MDP が与えられた場合、$epsilon$-Optimal Policy を学ぶためのモデルなしアルゴリズムを提供する。十分小さな$epsilon$の場合、サンプルの複雑さで改良されたアルゴリズムを示す。
論文参考訳（メタデータ） (2020-06-06T13:34:41Z)
The Average-Case Time Complexity of Certifying the Restricted Isometry Property [66.65353643599899]
圧縮センシングにおいて、100万倍のN$センシング行列上の制限等尺性(RIP)はスパースベクトルの効率的な再構成を保証する。 Mtimes N$ matrices with i.d.$mathcalN(0,1/M)$ entry。
論文参考訳（メタデータ） (2020-05-22T16:55:01Z)
Agnostic Q-learning with Function Approximation in Deterministic Systems: Tight Bounds on Approximation Error and Sample Complexity [94.37110094442136]
本稿では,決定論的システムにおける関数近似を用いたQ$学習の問題について検討する。もし$delta = Oleft(rho/sqrtdim_Eright)$なら、$Oleft(dim_Eright)$を使って最適なポリシーを見つけることができる。
論文参考訳（メタデータ） (2020-02-17T18:41:49Z)

関連論文リストは本サイト内にある論文のタイトル・アブストラクトから自動的に作成しています。

指定された論文の情報です。
本サイトの運営者は本サイト（すべての情報・翻訳含む）の品質を保証せず、本サイト（すべての情報・翻訳含む）を使用して発生したあらゆる結果について一切の責任を負いません。