Fugu-MT 論文翻訳(概要): ProgAgent:A Continual RL Agent with Progress-Aware Rewards

論文の概要: ProgAgent:A Continual RL Agent with Progress-Aware Rewards

arxiv url: http://arxiv.org/abs/2603.07784v1
Date: Sun, 08 Mar 2026 19:58:07 GMT
ステータス: 翻訳完了
システム内更新日: 2026-03-10 15:13:15.213391
Title: ProgAgent:A Continual RL Agent with Progress-Aware Rewards
Title（参考訳）: ProgAgent:プログレッシブ・アウェア・リワードを有する連続RL剤
Authors: Jinzhou Tan, Gabriel Adineera, Jinoh Kim,
Abstract要約: ProgAgentは、プログレッシブアウェアの報酬学習をJAXネイティブシステムアーキテクチャと統合する継続的強化学習エージェントである。これは、初期、現在、および目標観測におけるタスクの進捗を推定する知覚モデルを通じて、ラベルなしのエキスパートビデオから、密集した形をした報酬を導き出す。 ProgAgentは、非常に並列なロールアウトと完全に異なる更新をサポートし、洗練された統合された目標の実現を可能にする。
参考スコア（独自算出の注目度）: 0.07646713951724009
License: http://creativecommons.org/licenses/by/4.0/
Abstract: We present ProgAgent, a continual reinforcement learning (CRL) agent that unifies progress-aware reward learning with a high-throughput, JAX-native system architecture. Lifelong robotic learning grapples with catastrophic forgetting and the high cost of reward specification. ProgAgent tackles these by deriving dense, shaped rewards from unlabeled expert videos through a perceptual model that estimates task progress across initial, current, and goal observations. We theoretically interpret this as a learned state-potential function, delivering robust guidance in line with expert behaviors. To maintain stability amid online exploration - where novel, out-of-distribution states arise - we incorporate an adversarial push-back refinement that regularizes the reward model, curbing overconfident predictions on non-expert trajectories and countering distribution shift. By embedding this reward mechanism into a JIT-compiled loop, ProgAgent supports massively parallel rollouts and fully differentiable updates, rendering a sophisticated unified objective feasible: it merges PPO with coreset replay and synaptic intelligence for an enhanced stability-plasticity balance. Evaluations on ContinualBench and Meta-World benchmarks highlight ProgAgent's advantages: it markedly reduces forgetting, boosts learning speed, and outperforms key baselines in visual reward learning (e.g., Rank2Reward, TCN) and continual learning (e.g., Coreset, SI) - surpassing even an idealized perfect memory agent. Real-robot trials further validate its ability to acquire complex manipulation skills from noisy, few-shot human demonstrations.
Abstract（参考訳）: ProgAgentは、プログレッシブ・アウェア・報酬学習を高スループットでJAXネイティブなシステムアーキテクチャで統合する、CRL(Continuous Regressed Learning)エージェントである。生涯にわたるロボット学習は、破滅的な忘れ物と報酬仕様の高コストで波及する。 ProgAgentは、未ラベルのエキスパートビデオから、初期、現在、目標をまたいだタスクの進捗を推定する知覚モデルを通じて、密集した形をした報酬を導き出すことによってこれらに取り組む。理論的には、これを学習された状態ポテンシャル関数として解釈し、専門家の行動に合わせて堅牢なガイダンスを提供する。オンライン探究中の安定性を維持するために,報奨モデルを規則化し,非専門的軌跡に対する過度な予測を抑え,配当シフトに対処する敵のプッシュバック改良を取り入れた。 JITコンパイルされたループにこの報酬メカニズムを埋め込むことで、ProgAgentは大規模な並列ロールアウトと完全に異なる更新をサポートし、洗練された統一された目的を実現することができる。 ContinualBenchとMeta-Worldベンチマークの評価は、ProgAgentの利点を強調している。これは、忘れを著しく減らし、学習速度を向上し、視覚報酬学習(例: Rank2Reward, TCN)と継続学習(例: Coreset, SI)において重要なベースラインを上回り、理想化された完全なメモリエージェントでさえも上回っている。リアルロボットの試行は、ノイズの多い数発の人間のデモから複雑な操作スキルを得る能力をさらに検証する。

論文の概要: ProgAgent:A Continual RL Agent with Progress-Aware Rewards

関連論文リスト