Fugu-MT 論文翻訳(概要): Horizon Reduction as Information Loss in Offline Reinforcement Learning

論文の概要: Horizon Reduction as Information Loss in Offline Reinforcement Learning

arxiv url: http://arxiv.org/abs/2601.00831v1
Date: Thu, 25 Dec 2025 07:41:48 GMT
ステータス: 翻訳完了
システム内更新日: 2026-01-11 18:48:17.534725
Title: Horizon Reduction as Information Loss in Offline Reinforcement Learning
Title（参考訳）: オフライン強化学習における情報損失としての水平化
Authors: Uday Kumar Nidadala, Venkata Bhumika Guthi,
Abstract要約: 地平線低減は、オフラインの強化学習において、基本的かつ発見不可能な情報損失を引き起こす可能性があることを示す。固定長軌道セグメントからの学習として地平線低減を定式化し、このパラダイムの下では、最適政策が準最適政策と統計的に区別できないことを証明する。本研究は,地平線低減を安全に行うために必要な条件を確立し,アルゴリズムの改良だけでは克服できない本質的な限界を強調した。
参考スコア（独自算出の注目度）: 0.0
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Horizon reduction is a common design strategy in offline reinforcement learning (RL), used to mitigate long-horizon credit assignment, improve stability, and enable scalable learning through truncated rollouts, windowed training, or hierarchical decomposition (Levine et al., 2020; Prudencio et al., 2023; Park et al., 2025). Despite recent empirical evidence that horizon reduction can improve scaling on challenging offline RL benchmarks, its theoretical implications remain underdeveloped (Park et al., 2025). In this paper, we show that horizon reduction can induce fundamental and irrecoverable information loss in offline RL. We formalize horizon reduction as learning from fixed-length trajectory segments and prove that, under this paradigm and any learning interface restricted to fixed-length trajectory segments, optimal policies may be statistically indistinguishable from suboptimal ones even with infinite data and perfect function approximation. Through a set of minimal counterexample Markov decision processes (MDPs), we identify three distinct structural failure modes: (i) prefix indistinguishability leading to identifiability failure, (ii) objective misspecification induced by truncated returns, and (iii) offline dataset support and representation aliasing. Our results establish necessary conditions under which horizon reduction can be safe and highlight intrinsic limitations that cannot be overcome by algorithmic improvements alone, complementing algorithmic work on conservative objectives and distribution shift that addresses a different axis of offline RL difficulty (Fujimoto et al., 2019; Kumar et al., 2020; Gulcehre et al., 2020).
Abstract（参考訳）: 水平縮小は、オフライン強化学習(RL)における一般的な設計戦略であり、長期のクレジット割り当てを緩和し、安定性を改善し、切り捨てられたロールアウト、ウィンドウ付きトレーニング、階層的分解(Levine et al , 2020; Prudencio et al , 2023; Park et al , 2025)を通じてスケーラブルな学習を可能にする。最近の実証的な証拠では、地平線減少は挑戦的なオフラインRLベンチマークのスケーリングを改善することができるが、その理論的意味は未発達のままである(Park et al , 2025)。本稿では,オフラインRLにおける地平線低減により,基本的かつ発見不可能な情報損失が生じることを示す。我々は、固定長軌道セグメントからの学習として地平線低減を定式化し、このパラダイムと、固定長軌道セグメントに制限された学習インタフェースにより、無限のデータと完全関数近似であっても、最適ポリシーは統計的に最適値と区別できないことを証明した。最小限の反例であるマルコフ決定プロセス(MDP)を通じて、3つの異なる構造的障害モードを識別する。一識別不能につながる前置詞の識別不能二逃走した返却によって引き起こされた客観的な不特定、及び (iii)オフラインのデータセットのサポートと表現エイリアス。本研究は, 地平線低減が安全であるために必要な条件を確立し, アルゴリズムの改良だけでは克服できない本質的な限界を強調し, オフラインRLの難易度が異なる軸に対処する保守的目標と分布シフトのアルゴリズム的作業を補完する(Fujimoto et al , 2019; Kumar et al , 2020; Gulcehre et al , 2020)。

論文の概要: Horizon Reduction as Information Loss in Offline Reinforcement Learning

関連論文リスト