Fugu-MT 論文翻訳(概要): R2R2: Robust Representation for Intensive Experience Reuse via Redundancy Reduction in Self-Predictive Learning

論文の概要: R2R2: Robust Representation for Intensive Experience Reuse via Redundancy Reduction in Self-Predictive Learning

arxiv url: http://arxiv.org/abs/2605.14026v1
Date: Wed, 13 May 2026 18:38:32 GMT
ステータス: 翻訳完了
システム内更新日: 2026-05-15 21:45:34.45961
Title: R2R2: Robust Representation for Intensive Experience Reuse via Redundancy Reduction in Self-Predictive Learning
Title（参考訳）: R2R2:自己予測学習における冗長化による集中的体験再利用のためのロバスト表現
Authors: Sanghyeob Song, Donghyeok Lee, Jinsik Kim, Sungroh Yoon,
Abstract要約: 自己予測学習(SPL)における正規化手法として冗長化によるロバスト表現(R2R2)を提案する。 TD7のようなSPLネイティブアルゴリズム上でR2R2を検証する。 11の連続制御タスクに対する実験では、R2R2がオーバーフィッティングを効果的に軽減することを確認した。
参考スコア（独自算出の注目度）: 40.03346193264488
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: For reinforcement learning in data-scarce domains like real-world robotics, intensive data reuse enhances efficiency but induces overfitting. While prior works focus on critic bias, representation-level instability in Self-Predictive Learning (SPL) under high Update-to-Data (UTD) regimes remains underexplored. To bridge this gap, we propose Robust Representation via Redundancy Reduction (R2R2), a regularization method within SPL. We theoretically identify that standard zero-centering conflicts with SPL's spectral properties and design a non-centered objective accordingly. We verify R2R2 on SPL-native algorithms like TD7. Furthermore, to demonstrate its orthogonality to prior advancements, we extend the state-of-the-art SimbaV2, which originally lacks SPL, by integrating a tailored SPL module, termed SimbaV2-SPL. Experiments across 11 continuous control tasks confirm that R2R2 effectively mitigates overfitting; specifically, at a UTD ratio of 20, it improves TD7 by ~22% and provides additional gains on top of SimbaV2-SPL, which itself establishes a new state-of-the-art. The code can be found at: https://github.com/songsang7/R2R2
Abstract（参考訳）: 現実世界のロボティクスのようなデータ共有分野における強化学習のために、集中的なデータ再利用は効率を高めるが、過度な適合を引き起こす。先行研究は批判バイアスに焦点が当てられていたが、高度更新データ(UTD)体制下での自己予測学習(SPL)における表現レベルの不安定性は未解明のままである。このギャップを埋めるために,SPL内の正規化手法である冗長化(R2R2)によるロバスト表現を提案する。理論的には、標準ゼロ中心はSPLのスペクトル特性と矛盾し、それに応じて非中心の目的を設計する。 TD7のようなSPLネイティブアルゴリズム上でR2R2を検証する。さらに,SimbaV2-SPLと呼ばれるSPLモジュールを統合することで,SPLを欠いた最先端のSimbaV2を拡張した。 R2R2はUTD比20でTD7を約22%改善し、SimbaV2-SPL上で新たな最先端技術を確立する。コードは、https://github.com/songsang7/R2R2で参照できる。

論文の概要: R2R2: Robust Representation for Intensive Experience Reuse via Redundancy Reduction in Self-Predictive Learning

関連論文リスト