Fugu-MT 論文翻訳(概要): Learning While Acting: A Skill-Enhanced Test-Time Co-Evolution Framework for Online Lifelong Learning Agents

論文の概要: Learning While Acting: A Skill-Enhanced Test-Time Co-Evolution Framework for Online Lifelong Learning Agents

arxiv url: http://arxiv.org/abs/2606.04815v1
Date: Wed, 03 Jun 2026 12:38:57 GMT
ステータス: 翻訳完了
システム内更新日: 2026-06-04 20:44:18.752399
Title: Learning While Acting: A Skill-Enhanced Test-Time Co-Evolution Framework for Online Lifelong Learning Agents
Title（参考訳）: 実行中の学習:オンライン生涯学習エージェントのためのスキル強化テストタイム共進化フレームワーク
Authors: Bo Mao, Jie Zhou, Yutao Yang, Xin Li, Xian Wei, Qin Chen, Xingjiao Wu, Liang He,
Abstract要約: 大規模言語モデル(LLM)エージェントが動的にインタラクティブな環境で動作するためには、生涯学習が不可欠である。オンライン生涯学習エージェントのための2段階強化学習フレームワークであるSkill-enhanced Test-Time Co-Evolution(textttLifeSkill)を提案する。
参考スコア（独自算出の注目度）: 32.49699221723716
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Lifelong learning is essential for Large Language Model (LLM) agents operating in dynamic, interactive environments. However, existing lifelong learning agents for long-horizon tasks typically depend on discrete skill or past experiences retrieval with static parameters during inference, which prevents them from continuously internalizing test-time feedback like human learners. To bridge this gap, we propose Skill-enhanced Test-Time Co-Evolution (\texttt{LifeSkill}), a two-stage reinforcement learning framework for Online Lifelong Learning Agents. Specifically, we design Verifier-Guided Skill Learning that addresses the lack of direct supervision for skill extraction by rewarding candidate skills according to the average verifier success of multiple skill-conditioned policy rollouts, encouraging the model to generate skills that are useful for solving tasks rather than merely plausible in text. Furthermore, we introduce Online Skill Internalization, which continuously improves the policy model during test-time interaction by transforming skill-conditioned trajectories into reward signals. This enables the agent to directly internalize reasoning capabilities into its parameters, avoiding the context bloat of experience retrieval. Experiments on LifelongAgentBench show that LifeSkill improves average performance by 7 absolute points by comparing with existing lifelong agent baselines.
Abstract（参考訳）: 大規模言語モデル(LLM)エージェントが動的にインタラクティブな環境で動作するためには、生涯学習が不可欠である。しかし, 従来の長期学習エージェントは, 通常, 推論中の静的パラメータによる個別スキルや過去の経験検索に依存し, 人間の学習者のようなテスト時間フィードバックを継続的に内部化するのを防ぐ。このギャップを埋めるため,オンライン生涯学習エージェントのための2段階強化学習フレームワークであるSkill-enhanced Test-Time Co-Evolution(\texttt{LifeSkill})を提案する。具体的には、複数のスキル条件付きポリシーのロールアウトにおける平均的検証成功に応じて、候補スキルに報酬を与えることによって、スキル抽出の直接的な監督の欠如に対処する検証者ガイド型スキル学習を設計し、単にテキストで検証できるのではなく、タスクの解決に有用なスキルを生成するようモデルに促す。さらに,スキル条件付き軌道を報酬信号に変換することにより,テスト時間間相互作用におけるポリシーモデルの改善を継続的に行うオンラインスキル内部化を提案する。これにより、エージェントは推論能力をパラメータに直接内部化し、経験検索のコンテキスト肥大を避けることができる。 LifelongAgentBenchの実験では、LifeSkillは既存のLifelongエージェントベースラインと比較して7つの絶対ポイントで平均性能を改善する。

論文の概要: Learning While Acting: A Skill-Enhanced Test-Time Co-Evolution Framework for Online Lifelong Learning Agents

関連論文リスト