Fugu-MT 論文翻訳(概要): Shift Before You Learn: Enabling Low-Rank Representations in Reinforcement Learning

論文の概要: Shift Before You Learn: Enabling Low-Rank Representations in Reinforcement Learning

arxiv url: http://arxiv.org/abs/2509.05193v1
Date: Fri, 05 Sep 2025 15:48:20 GMT
ステータス: 翻訳完了
システム内更新日: 2025-09-08 14:27:25.63843
Title: Shift Before You Learn: Enabling Low-Rank Representations in Reinforcement Learning
Title（参考訳）: 学習前の変化:強化学習における低ランク表現の導入
Authors: Bastien Dubail, Stefan Stojanovic, Alexandre Proutière,
Abstract要約: シフトした後継尺度において,低ランク構造が自然に現れることを示す。有効な低ランク近似と推定に必要なシフトの量を定量化する。
参考スコア（独自算出の注目度）: 56.87989363424
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Low-rank structure is a common implicit assumption in many modern reinforcement learning (RL) algorithms. For instance, reward-free and goal-conditioned RL methods often presume that the successor measure admits a low-rank representation. In this work, we challenge this assumption by first remarking that the successor measure itself is not low-rank. Instead, we demonstrate that a low-rank structure naturally emerges in the shifted successor measure, which captures the system dynamics after bypassing a few initial transitions. We provide finite-sample performance guarantees for the entry-wise estimation of a low-rank approximation of the shifted successor measure from sampled entries. Our analysis reveals that both the approximation and estimation errors are primarily governed by the so-called spectral recoverability of the corresponding matrix. To bound this parameter, we derive a new class of functional inequalities for Markov chains that we call Type II Poincar\'e inequalities and from which we can quantify the amount of shift needed for effective low-rank approximation and estimation. This analysis shows in particular that the required shift depends on decay of the high-order singular values of the shifted successor measure and is hence typically small in practice. Additionally, we establish a connection between the necessary shift and the local mixing properties of the underlying dynamical system, which provides a natural way of selecting the shift. Finally, we validate our theoretical findings with experiments, and demonstrate that shifting the successor measure indeed leads to improved performance in goal-conditioned RL.
Abstract（参考訳）: 低ランク構造は、多くの現代の強化学習(RL)アルゴリズムにおいて一般的な暗黙の仮定である。例えば、報酬のないRL法やゴール条件付きRL法は、後続測度が低ランク表現を許すと仮定することが多い。本稿では、まず、後継測度自体が低ランクではないことを指摘して、この仮定に挑戦する。代わりに、シフトした後続測度において、低ランク構造が自然に出現し、いくつかの初期遷移をバイパスした後、システムダイナミクスを捕捉することを示した。我々は、サンプルエントリからシフトした後続測度を低ランクで近似するエントリワイズ推定のための有限サンプル性能保証を提供する。解析の結果,近似誤差と推定誤差は,主に,対応する行列のスペクトル回復性によって制御されていることがわかった。このパラメータをバウンドするために、第二種ポアンカーの不等式(Type II Poincar\'e inequality)と呼ぶマルコフ連鎖の関数的不等式の新しいクラスを導出し、そこから効果的なローランク近似と推定に必要なシフトの量を定量化できる。この分析は、特に要求されるシフトは、シフトされた後続測度の高次特異値の崩壊に依存しており、したがって実際は小さいことを示している。さらに, 必要なシフトと基礎となる力学系の局所混合特性の関連性を確立し, シフトを選択する自然な方法を提供する。最後に,実験により理論的知見を検証し,後続測度の変化がゴール条件付きRLの性能向上につながることを示す。

関連論文リスト

Minimax Optimal Two-Stage Algorithm For Moment Estimation Under Covariate Shift [10.35788775775647]
ソースとターゲットの分布が分かっている場合,問題の最小境界について検討する。具体的には、まず、ソース分布の下で関数の最適推定器を訓練し、その後、モーメント推定器を校正する確率比再重み付け手順を使用する。この問題を解決するために、二重ロバスト性を確保し、対応する上界を与える推定器の切り離されたバージョンを提案する。
論文参考訳（メタデータ） (2025-06-30T01:32:36Z)
Beyond Progress Measures: Theoretical Insights into the Mechanism of Grokking [50.465604300990904]
グロキング(Grokking)とは、オーバーフィッティングの拡張後のテスト精度の急激な改善を指す。本研究では、素数演算のタスクにおいて、Transformerの基盤となるグルーキング機構について検討する。
論文参考訳（メタデータ） (2025-04-04T04:42:38Z)
TransFusion: Covariate-Shift Robust Transfer Learning for High-Dimensional Regression [11.040033344386366]
対象タスクの学習性能を限定的なサンプルで向上させるため, 新規な融合正規化器を用いた2段階の手法を提案する。対象モデルの推定誤差に対して、漸近的境界が提供される。提案手法を分散設定に拡張し,事前学習ファインタニング戦略を実現する。
論文参考訳（メタデータ） (2024-04-01T14:58:16Z)
Rethinking Classifier Re-Training in Long-Tailed Recognition: A Simple Logits Retargeting Approach [102.0769560460338]
我々は,クラスごとのサンプル数に関する事前知識を必要とせず,シンプルなロジットアプローチ(LORT)を開発した。提案手法は,CIFAR100-LT, ImageNet-LT, iNaturalist 2018など,様々な不均衡データセットの最先端性能を実現する。
論文参考訳（メタデータ） (2024-03-01T03:27:08Z)
When Does Confidence-Based Cascade Deferral Suffice? [69.28314307469381]
カスケードは、推論コストをサンプル毎に適応的に変化させる古典的な戦略である。 deferralルールは、シーケンス内の次の分類子を呼び出すか、または予測を終了するかを決定する。カスケードの構造に執着しているにもかかわらず、信頼に基づく推論は実際は極めてうまく機能することが多い。
論文参考訳（メタデータ） (2023-07-06T04:13:57Z)
Beyond the Edge of Stability via Two-step Gradient Updates [49.03389279816152]
Gradient Descent(GD)は、現代の機械学習の強力な仕事場である。 GDが局所最小値を見つける能力は、リプシッツ勾配の損失に対してのみ保証される。この研究は、2段階の勾配更新の分析を通じて、単純だが代表的でありながら、学習上の問題に焦点をあてる。
論文参考訳（メタデータ） (2022-06-08T21:32:50Z)
Domain-Adjusted Regression or: ERM May Already Learn Features Sufficient for Out-of-Distribution Generalization [52.7137956951533]
既存の特徴から予測器を学習するためのよりシンプルな手法を考案することは、将来の研究にとって有望な方向である、と我々は主張する。本稿では,線形予測器を学習するための凸目標である領域調整回帰(DARE)を紹介する。自然モデルの下では、DARE解が制限されたテスト分布の集合に対する最小最適予測器であることを証明する。
論文参考訳（メタデータ） (2022-02-14T16:42:16Z)
Forward and inverse reinforcement learning sharing network weights and hyperparameters [3.705785916791345]
ERILは、エントロピー規則化マルコフ決定プロセスの枠組みの下で、前方および逆強化学習(RL)を組み合わせる。前部RLステップは、逆RLステップによって推定される逆KLを最小化する。逆KL分岐の最小化は最適ポリシーの発見と等価であることを示す。
論文参考訳（メタデータ） (2020-08-17T13:12:44Z)

関連論文リストは本サイト内にある論文のタイトル・アブストラクトから自動的に作成しています。

指定された論文の情報です。
本サイトの運営者は本サイト（すべての情報・翻訳含む）の品質を保証せず、本サイト（すべての情報・翻訳含む）を使用して発生したあらゆる結果について一切の責任を負いません。