Fugu-MT 論文翻訳(概要): Spectral Bellman Method: Unifying Representation and Exploration in RL

論文の概要: Spectral Bellman Method: Unifying Representation and Exploration in RL

arxiv url: http://arxiv.org/abs/2507.13181v1
Date: Thu, 17 Jul 2025 14:50:52 GMT
ステータス: 翻訳完了
システム内更新日: 2025-07-18 20:10:24.543389
Title: Spectral Bellman Method: Unifying Representation and Exploration in RL
Title（参考訳）: スペクトルベルマン法:RLにおける表現と探索の統合
Authors: Ofir Nabati, Bo Dai, Shie Mannor, Guy Tennenholtz,
Abstract要約: この研究は、価値に基づく強化学習のための表現を学習するための新しいフレームワークであるSpectral Bellman Representationを紹介する。ベルマン力学と特徴共分散を整合させることにより,学習した表現が構造化された探索を可能にすることを示す。我々のフレームワークは、自然に強力なマルチステップベルマン演算子に拡張され、その影響をさらに拡大します。
参考スコア（独自算出の注目度）: 54.71169912483302
License: http://creativecommons.org/licenses/by/4.0/
Abstract: The effect of representation has been demonstrated in reinforcement learning, from both theoretical and empirical successes. However, the existing representation learning mainly induced from model learning aspects, misaligning with our RL tasks. This work introduces Spectral Bellman Representation, a novel framework derived from the Inherent Bellman Error (IBE) condition, which aligns with the fundamental structure of Bellman updates across a space of possible value functions, therefore, directly towards value-based RL. Our key insight is the discovery of a fundamental spectral relationship: under the zero-IBE condition, the transformation of a distribution of value functions by the Bellman operator is intrinsically linked to the feature covariance structure. This spectral connection yields a new, theoretically-grounded objective for learning state-action features that inherently capture this Bellman-aligned covariance. Our method requires a simple modification to existing algorithms. We demonstrate that our learned representations enable structured exploration, by aligning feature covariance with Bellman dynamics, and improve overall performance, particularly in challenging hard-exploration and long-horizon credit assignment tasks. Our framework naturally extends to powerful multi-step Bellman operators, further broadening its impact. Spectral Bellman Representation offers a principled and effective path toward learning more powerful and structurally sound representations for value-based reinforcement learning.
Abstract（参考訳）: 表現の効果は、理論的および経験的成功から強化学習において実証されてきた。しかし、既存の表現学習は主にモデル学習の側面から誘導され、我々のRLタスクと不一致している。 Inherent Bellman Error (IBE) 条件から派生した新しいフレームワークである Spectral Bellman Representation を導入する。ゼロIBE条件の下では、ベルマン作用素による値関数の分布の変換は本質的に特徴共分散構造と結びついている。このスペクトル接続は、このベルマン配向共分散を本質的に捉える状態作用の特徴を学習するための、理論上は新たな目的をもたらす。我々の手法は既存のアルゴリズムに簡単な修正を必要とする。本稿では,ベルマン力学と特徴共分散を整合させて構造化された探索を可能にし,特に難易度探索や長期クレジット割り当てタスクにおいて,全体的な性能を向上させることを実証する。我々のフレームワークは、自然に強力なマルチステップベルマン演算子に拡張され、その影響をさらに拡大します。スペクトルベルマン表現(Spectral Bellman Representation)は、価値に基づく強化学習のための、より強力で構造的に健全な表現を学ぶための、原則的で効果的な経路を提供する。

関連論文リスト

AURORA: Augmented Understanding via Structured Reasoning and Reinforcement Learning for Reference Audio-Visual Segmentation [113.75682363364004]
AURORAは、参照音声視覚セグメント化における真の推論と言語理解を強化するために設計されたフレームワークである。 AURORAはRef-AVSベンチマークの最先端性能を達成し、非参照セグメンテーションに効果的に一般化する。
論文参考訳（メタデータ） (2025-08-04T07:47:38Z)
Gradual Transition from Bellman Optimality Operator to Bellman Operator in Online Reinforcement Learning [47.57615889991631]
連続行動空間の場合、アクター批判法はオンライン強化学習(RL)において広く用いられている。本研究では,ベルマン最適度演算子をアクター批判フレームワークに組み込むことの有効性を検討した。
論文参考訳（メタデータ） (2025-06-06T10:46:20Z)
Universal Approximation Theorem for Deep Q-Learning via FBSDE System [2.1756081703276]
本稿では,Deep Q-Networks (DQN) のクラスに対する普遍近似理論を確立する。関数空間上で作用するニューラル演算子として考えられたディープ残留ネットワークの層がベルマン作用素の作用を近似できることを示す。
論文参考訳（メタデータ） (2025-05-09T13:11:55Z)
When is Realizability Sufficient for Off-Policy Reinforcement Learning? [17.317841035807696]
我々は,所定の機能クラスに対してのみ実現可能性を持つ場合,非政治強化学習の統計的複雑さを分析する。ベルマン誤差と呼ばれる近似誤差項を含まない非政治強化学習の有限サンプル保証を確立する。
論文参考訳（メタデータ） (2022-11-10T03:15:31Z)
Spectral Decomposition Representation for Reinforcement Learning [100.0424588013549]
本稿では, スペクトル分解表現法(SPEDER)を提案する。この手法は, データ収集ポリシーに急激な依存を生じさせることなく, ダイナミックスから状態-作用の抽象化を抽出する。理論的解析により、オンライン設定とオフライン設定の両方において提案アルゴリズムのサンプル効率が確立される。実験により、いくつかのベンチマークで現在の最先端アルゴリズムよりも優れた性能を示す。
論文参考訳（メタデータ） (2022-08-19T19:01:30Z)
Learning Bellman Complete Representations for Offline Policy Evaluation [51.96704525783913]
サンプル効率のよいOPEの2つの条件は、ベルマン完全性とカバレッジである。我々の表現は、政治外RLのために開発された従来の表現学習手法と比較して、OPEをより良くできることを示す。
論文参考訳（メタデータ） (2022-07-12T21:02:02Z)
A Free Lunch from the Noise: Provable and Practical Exploration for Representation Learning [55.048010996144036]
ある雑音仮定の下では、対応するマルコフ遷移作用素の線型スペクトル特性を自由な閉形式で得られることを示す。本稿では,スペクトルダイナミクス埋め込み(SPEDE)を提案する。これはトレードオフを破り,雑音の構造を利用して表現学習のための楽観的な探索を完遂する。
論文参考訳（メタデータ） (2021-11-22T19:24:57Z)
Bayesian Bellman Operators [55.959376449737405]
ベイズ強化学習(RL)の新しい視点について紹介する。我々のフレームワークは、ブートストラップが導入されたとき、モデルなしアプローチは実際には値関数ではなくベルマン作用素よりも後部を推測する、という洞察に動機づけられている。
論文参考訳（メタデータ） (2021-06-09T12:20:46Z)
Neurally Augmented ALISTA [15.021419552695066]
本稿では、LSTMネットワークを用いて、再構成中の各ターゲットベクトルのステップサイズと閾値を個別に計算するニューラルネットワークALISTAを提案する。提案手法はスパース再構成における経験的性能をさらに向上させ,特に圧縮比がより困難になるにつれて,既存のアルゴリズムのマージンが向上することを示す。
論文参考訳（メタデータ） (2020-10-05T11:39:49Z)

関連論文リストは本サイト内にある論文のタイトル・アブストラクトから自動的に作成しています。

指定された論文の情報です。
本サイトの運営者は本サイト（すべての情報・翻訳含む）の品質を保証せず、本サイト（すべての情報・翻訳含む）を使用して発生したあらゆる結果について一切の責任を負いません。