Fugu-MT 論文翻訳(概要): Autonomous Learning From Success and Failure: Goal-Conditioned Supervised Learning with Negative Feedback

論文の概要: Autonomous Learning From Success and Failure: Goal-Conditioned Supervised Learning with Negative Feedback

arxiv url: http://arxiv.org/abs/2509.03206v1
Date: Wed, 03 Sep 2025 10:50:48 GMT
ステータス: 翻訳完了
システム内更新日: 2025-09-04 21:40:46.489517
Title: Autonomous Learning From Success and Failure: Goal-Conditioned Supervised Learning with Negative Feedback
Title（参考訳）: 成功と失敗からの自律的な学習 - 否定的なフィードバックによる目標設定型指導型学習
Authors: Zeqiang Zhang, Fabian Wurzberger, Gerrit Schmid, Sebastian Gottwald, Daniel A. Braun,
Abstract要約: Goal-Conditioned Supervised Learningは、自律システムのための自己アニメーション学習を可能にする、潜在的なソリューションとして登場した。本稿では,GCSLフレームワークに対照的な学習原則を統合し,成功と失敗の両方から学ぶ新しいモデルを提案する。
参考スコア（独自算出の注目度）: 2.36462256498849
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Reinforcement learning faces significant challenges when applied to tasks characterized by sparse reward structures. Although imitation learning, within the domain of supervised learning, offers faster convergence, it relies heavily on human-generated demonstrations. Recently, Goal-Conditioned Supervised Learning (GCSL) has emerged as a potential solution by enabling self-imitation learning for autonomous systems. By strategically relabelling goals, agents can derive policy insights from their own experiences. Despite the successes of this framework, it presents two notable limitations: (1) Learning exclusively from self-generated experiences can exacerbate the agents' inherent biases; (2) The relabelling strategy allows agents to focus solely on successful outcomes, precluding them from learning from their mistakes. To address these issues, we propose a novel model that integrates contrastive learning principles into the GCSL framework to learn from both success and failure. Through empirical evaluations, we demonstrate that our algorithm overcomes limitations imposed by agents' initial biases and thereby enables more exploratory behavior. This facilitates the identification and adoption of effective policies, leading to superior performance across a variety of challenging environments.
Abstract（参考訳）: 強化学習は、スパース報酬構造を特徴とするタスクに適用する場合、重大な課題に直面します。模倣学習は、教師付き学習の領域内において、より高速な収束を提供するが、それは人為的な実演に大きく依存している。近年,自律システムのための自己刺激学習を実現することで,GCSL(Goal-Conditioned Supervised Learning)が潜在的なソリューションとして浮上している。戦略的に目標を遅延させることで、エージェントは自身の経験から政策の洞察を導き出すことができる。この枠組みの成功にもかかわらず、(1)自己生成経験のみから学ぶことは、エージェント固有のバイアスを悪化させる可能性がある、(2)反抗戦略は、エージェントが失敗から学ぶことを排除し、成功した結果にのみ焦点を絞ることができる、という2つの注目すべき制限を提示する。これらの課題に対処するため、我々は、GCSLフレームワークに対照的な学習原則を統合し、成功と失敗の両方から学習する新しいモデルを提案する。実験的な評価を通じて,エージェントの初期バイアスによる制約を克服し,より探索的な行動を可能にすることを示す。これにより、効果的なポリシの識別と採用が促進され、さまざまな課題のある環境においてパフォーマンスが向上する。

論文の概要: Autonomous Learning From Success and Failure: Goal-Conditioned Supervised Learning with Negative Feedback

関連論文リスト