Fugu-MT 論文翻訳(概要): On the Variance of Temporal Difference Learning and its Reduction Using Control Variates

論文の概要: On the Variance of Temporal Difference Learning and its Reduction Using Control Variates

arxiv url: http://arxiv.org/abs/2606.20357v1
Date: Thu, 18 Jun 2026 15:20:10 GMT
ステータス: 翻訳完了
システム内更新日: 2026-06-19 18:23:39.94312
Title: On the Variance of Temporal Difference Learning and its Reduction Using Control Variates
Title（参考訳）: 制御変数を用いた時間差分学習のばらつきとその低減について
Authors: Hsiao-Ru Pan, Bernhard Schölkopf,
Abstract要約: 分散低減のメカニズムの1つは、多数の独立軌道を効果的に集約することである。モンテカルロ推定器の挙動を, 慎重に設計した環境下で数値的に説明する。
参考スコア（独自算出の注目度）: 54.916555668983726
License: http://creativecommons.org/licenses/by/4.0/
Abstract: We analyze the variance of temporal difference (TD) learning using the phased setting with tabular representation, and show that one of the mechanisms behind its ability to reduce variance is by effectively aggregating over a larger number of independent trajectories. Based on this insight, we demonstrate that (1) the variance of TD is asymptotically bounded from above by Monte Carlo (MC) estimators, and (2) shorter horizon updates incurs less variance for a fixed number of samples. Beyond TD, we show that Direct Advantage Estimation (DAE), a method for estimating the advantage function, can be seen as a type of regression-adjusted control variate, which achieves a tighter bound on the variance compared to TD in the large-sample limit. Finally, we numerically illustrate the behaviors of these estimators with carefully designed environments.
Abstract（参考訳）: 表表表現を用いた位相差学習(TD)のばらつきを分析し、そのばらつきを抑えるメカニズムの1つは、多数の独立した軌跡を効果的に集約することであることを示す。この知見に基づいて、(1)TDの分散はモンテカルロ (MC) 推定器によって上から漸近的に束縛され、(2)短い地平線更新は固定されたサンプル数の分散を減少させることを示した。 TDを超えて、優位関数を推定する手法である直接アドバンテージ推定(DAE)が回帰調整制御の変分の一種と見なされ、大きなサンプル限界におけるTDと比較して、分散に厳密な拘束力が得られることを示す。最後に,これらの推定器の挙動を,慎重に設計した環境下で数値的に説明する。

関連論文リスト

Causal vs. Anticausal merging of predictors [57.26526031579287]
同じデータを用いて、因果方向と反因果方向の融合予測器から生じる差について検討した。帰納的バイアスとしてCausal Maximum Entropy (CMAXENT) を用いて予測器をマージする。
論文参考訳（メタデータ） (2025-01-14T20:38:15Z)
STATE: A Robust ATE Estimator of Heavy-Tailed Metrics for Variance Reduction in Online Controlled Experiments [22.32661807469984]
我々は、学生のt分布と機械学習ツールを統合して、ヘビーテールのメトリクスに適合する新しいフレームワークを開発する。ログ類似度関数を最適化するために変分EM法を採用することにより、アウトリアの負の影響を大幅に排除するロバストな解を推測できる。 Meituan実験プラットフォーム上での合成データと長期実験結果のシミュレーションにより,本手法の有効性を実証した。
論文参考訳（メタデータ） (2024-07-23T09:35:59Z)
TIC-TAC: A Framework for Improved Covariance Estimation in Deep Heteroscedastic Regression [109.69084997173196]
奥行き回帰は、予測分布の平均と共分散を負の対数類似度を用いて共同最適化する。近年の研究では, 共分散推定に伴う課題により, 準最適収束が生じる可能性が示唆されている。 1)予測共分散は予測平均のランダム性を真に捉えているか? その結果, TICは共分散を正確に学習するだけでなく, 負の対数類似性の収束性の向上も促進することがわかった。
論文参考訳（メタデータ） (2023-10-29T09:54:03Z)
On the Statistical Benefits of Temporal Difference Learning [6.408072565019087]
アクションのデータセットと結果の長期的な報酬が与えられた場合、直接推定アプローチは値関数に適合する。直感的な逆軌道プーリング係数は, 平均二乗誤差の減少率を完全に特徴付けることを示す。 2つの状態における値対号差の推定が劇的に改善できることを実証する。
論文参考訳（メタデータ） (2023-01-30T21:02:25Z)
Finite time analysis of temporal difference learning with linear function approximation: Tail averaging and regularisation [44.27439128304058]
そこで本研究では,TD学習アルゴリズムの時間的有限性について検討した。ステップサイズ選択の下で、テール平均TDのパラメータ誤差に基づいて有限時間境界を導出する。
論文参考訳（メタデータ） (2022-10-12T04:37:54Z)
Double Control Variates for Gradient Estimation in Discrete Latent Variable Models [32.33171301923846]
スコア関数推定器の分散化手法を提案する。我々の推定器は、他の最先端推定器と比較してばらつきが低いことを示す。
論文参考訳（メタデータ） (2021-11-09T18:02:42Z)
VarGrad: A Low-Variance Gradient Estimator for Variational Inference [9.108412698936105]
我々は、VarGradが、離散VAE上の他の最先端推定器と比較して、トレードオフとトレードオフに有利なばらつきを提供することを示す。
論文参考訳（メタデータ） (2020-10-20T16:46:01Z)
Estimating Gradients for Discrete Random Variables by Sampling without Replacement [93.09326095997336]
我々は、置換のないサンプリングに基づいて、離散確率変数に対する期待値の偏りのない推定器を導出する。推定器は3つの異なる推定器のラオ・ブラックウェル化として導出可能であることを示す。
論文参考訳（メタデータ） (2020-02-14T14:15:18Z)
Reanalysis of Variance Reduced Temporal Difference Learning [57.150444843282]
Korda と La が提案した分散還元型TD (VRTD) アルゴリズムは,マルコフサンプルを用いたオンラインTD学習に直接適用する。我々は,VRTDが線形収束速度でTDの固定点解の近傍に収束することが保証されていることを示す。
論文参考訳（メタデータ） (2020-01-07T05:32:43Z)

関連論文リストは本サイト内にある論文のタイトル・アブストラクトから自動的に作成しています。

指定された論文の情報です。
本サイトの運営者は本サイト（すべての情報・翻訳含む）の品質を保証せず、本サイト（すべての情報・翻訳含む）を使用して発生したあらゆる結果について一切の責任を負いません。