Fugu-MT 論文翻訳(概要): Efficient Reward Identification In Max Entropy Reinforcement Learning with Sparsity and Rank Priors

論文の概要: Efficient Reward Identification In Max Entropy Reinforcement Learning with Sparsity and Rank Priors

arxiv url: http://arxiv.org/abs/2508.07400v1
Date: Sun, 10 Aug 2025 16:01:48 GMT
ステータス: 翻訳完了
システム内更新日: 2025-08-12 21:23:28.84265
Title: Efficient Reward Identification In Max Entropy Reinforcement Learning with Sparsity and Rank Priors
Title（参考訳）: 最大エントロピー強化学習におけるスポーサリティとランク優先による効率的なリワード同定
Authors: Mohamad Louai Shehab, Alperen Tercan, Necmiye Ozay,
Abstract要約: 本稿では,最大エントロピー強化学習問題から得られる最適方針や実演から,時変報酬関数を復元する問題を考察する。この問題は、基礎となる報酬について追加の仮定なしで非常に不適切である。どちらの場合も、これらの観測は効率的な最適化に基づく報酬同定アルゴリズムに繋がる。
参考スコア（独自算出の注目度）: 0.40964539027092917
License: http://creativecommons.org/licenses/by/4.0/
Abstract: In this paper, we consider the problem of recovering time-varying reward functions from either optimal policies or demonstrations coming from a max entropy reinforcement learning problem. This problem is highly ill-posed without additional assumptions on the underlying rewards. However, in many applications, the rewards are indeed parsimonious, and some prior information is available. We consider two such priors on the rewards: 1) rewards are mostly constant and they change infrequently, 2) rewards can be represented by a linear combination of a small number of feature functions. We first show that the reward identification problem with the former prior can be recast as a sparsification problem subject to linear constraints. Moreover, we give a polynomial-time algorithm that solves this sparsification problem exactly. Then, we show that identifying rewards representable with the minimum number of features can be recast as a rank minimization problem subject to linear constraints, for which convex relaxations of rank can be invoked. In both cases, these observations lead to efficient optimization-based reward identification algorithms. Several examples are given to demonstrate the accuracy of the recovered rewards as well as their generalizability.
Abstract（参考訳）: 本稿では,最大エントロピー強化学習問題から得られる最適ポリシーや実演から,時変報酬関数を復元する問題を考察する。この問題は、基礎となる報酬について追加の仮定なしで非常に不適切である。しかし、多くのアプリケーションでは、報酬は実際には同義であり、いくつかの事前情報は利用可能である。私たちは、報酬について2つの先例を考えます。 1)報酬は概ね一定であり、頻繁に変化する。 2)報酬は少数の特徴関数の線形結合で表すことができる。まず,前者に対する報奨識別問題は,線形制約を受けるスペーシフィケーション問題として再キャスト可能であることを示す。さらに,このスペーシフィケーション問題を正確に解く多項式時間アルゴリズムを提案する。次に,最小限の特徴量で表現可能な報酬を線形制約によるランク最小化問題として再キャストし,ランクの凸緩和を実現できることを示す。どちらの場合も、これらの観測は効率的な最適化に基づく報酬同定アルゴリズムに繋がる。回収された報酬の精度と一般化可能性を示すいくつかの例が与えられる。

論文の概要: Efficient Reward Identification In Max Entropy Reinforcement Learning with Sparsity and Rank Priors

関連論文リスト