Fugu-MT 論文翻訳(概要): Smaller Abstract State Spaces Enable Cross-Scale Generalization in Reinforcement Learning

論文の概要: Smaller Abstract State Spaces Enable Cross-Scale Generalization in Reinforcement Learning

arxiv url: http://arxiv.org/abs/2605.20272v1
Date: Tue, 19 May 2026 02:26:28 GMT
ステータス: 翻訳完了
システム内更新日: 2026-05-21 19:19:56.256948
Title: Smaller Abstract State Spaces Enable Cross-Scale Generalization in Reinforcement Learning
Title（参考訳）: 強化学習における大規模一般化を可能にするより小さな抽象状態空間
Authors: Nasehatul Mustakim, Lucas Lehnert,
Abstract要約: 本稿では,RLエージェントにおいてアウト・オブ・ディストリビューションの一般化を実現するための理論モデルを提案する。我々はエージェントのOODテスト性能に拘束力を与え、OODの一般化が達成可能な条件を定義する。我々の分析は、より複雑なタスクへの一般化を達成するためには、エージェントを有限個の抽象状態の小さな集合上で動作させることが重要であることを示唆している。
参考スコア（独自算出の注目度）: 1.8354627928712421
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: While humans readily generalize abstract concepts to more complex or larger tasks, building Reinforcement Learning (RL) systems with this ability remains elusive. Here, we present the first theoretical model of how such Out-of-Distribution (OOD) generalization can be achieved in RL agents. Our approach considers Partially Observable Markov Decision Processes (POMDPs) and assumes that an intelligent agent uses an abstraction function to determine which experiences can be treated as equivalent and which must be distinguished. First, we extend the existing state abstraction framework and proof techniques to POMDPs. Then, we define a successor-weighted model reduction, a model reduction variant that enables compression into smaller abstract spaces than prior definitions allow. We derive a bound on the agent's OOD test performance, thereby defining the conditions under which OOD generalization is achievable. This bound decomposes an agent's performance loss into approximation and estimation errors, revealing how reducing an agent's abstract state space size improves test performance and OOD generalization. Our analysis suggests that constraining an agent to operate over a small, finite set of abstract states is necessary for achieving generalization to more complex tasks. Our results motivate further research into learning RL architectures that scale across tasks of varying complexity levels.
Abstract（参考訳）: 人間は抽象概念を、より複雑なタスクやより大きなタスクに容易に一般化するが、強化学習(RL)システムを構築することは、まだ解明されていない。本稿では、RLエージェントにおいて、このようなOOD(Out-of-Distribution)の一般化を実現するための最初の理論的モデルを示す。提案手法では,部分観測可能なマルコフ決定過程(POMDP)を考察し,知的エージェントが抽象関数を用いて,どの体験を同等に扱うか,どの体験を区別すべきかを判断する。まず、既存の状態抽象化フレームワークと証明技術をPOMDPに拡張する。そこで我々は,従来の定義よりも小さな抽象空間への圧縮を可能にするモデル還元変種である,後続重み付きモデル還元を定義する。我々はエージェントのOODテスト性能に拘束力を与え、OODの一般化が達成可能な条件を定義する。このバウンダリは、エージェントのパフォーマンス損失を近似と推定エラーに分解し、エージェントの抽象状態空間サイズの削減がテストパフォーマンスとOODの一般化をいかに改善するかを明らかにする。我々の分析は、より複雑なタスクへの一般化を達成するためには、エージェントを有限個の抽象状態の小さな集合上で動作させることが重要であることを示唆している。我々の結果は、様々な複雑性レベルのタスクにまたがってスケールするRLアーキテクチャの学習に関するさらなる研究を動機付けている。

論文の概要: Smaller Abstract State Spaces Enable Cross-Scale Generalization in Reinforcement Learning

関連論文リスト