Fugu-MT 論文翻訳(概要): MetaClaw: Just Talk -- An Agent That Meta-Learns and Evolves in the Wild

論文の概要: MetaClaw: Just Talk -- An Agent That Meta-Learns and Evolves in the Wild

arxiv url: http://arxiv.org/abs/2603.17187v1
Date: Tue, 17 Mar 2026 22:30:30 GMT
ステータス: 翻訳完了
システム内更新日: 2026-03-19 18:32:57.427051
Title: MetaClaw: Just Talk -- An Agent That Meta-Learns and Evolves in the Wild
Title（参考訳）: MetaClaw: ただの会話 - メタラーニングと野生での進化を司るエージェント
Authors: Peng Xia, Jianwen Chen, Xinyu Yang, Haoqin Tu, Jiaqi Liu, Kaiwen Xiong, Siwei Han, Shi Qiu, Haonian Ji, Yuyin Zhou, Zeyu Zheng, Cihang Xie, Huaxiu Yao,
Abstract要約: 大規模言語モデル(LLM)エージェントは、複雑なタスクにますます使われている。既存の方法は、知識を蒸留せずに生の軌跡を保存するか、静的なスキルライブラリを維持するか、または再訓練のために破壊的なダウンタイムを必要とする。本稿では,基本的なLCMポリシと再利用可能な行動スキルのライブラリを共同で進化させるメタ学習フレームワークであるMetaClawを紹介する。
参考スコア（独自算出の注目度）: 74.7263562191605
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Large language model (LLM) agents are increasingly used for complex tasks, yet deployed agents often remain static, failing to adapt as user needs evolve. This creates a tension between the need for continuous service and the necessity of updating capabilities to match shifting task distributions. On platforms like OpenClaw, which handle diverse workloads across 20+ channels, existing methods either store raw trajectories without distilling knowledge, maintain static skill libraries, or require disruptive downtime for retraining. We present MetaClaw, a continual meta-learning framework that jointly evolves a base LLM policy and a library of reusable behavioral skills. MetaClaw employs two complementary mechanisms. Skill-driven fast adaptation analyzes failure trajectories via an LLM evolver to synthesize new skills, enabling immediate improvement with zero downtime. Opportunistic policy optimization performs gradient-based updates via cloud LoRA fine-tuning and Reinforcement Learning with a Process Reward Model (RL-PRM). This is triggered during user-inactive windows by the Opportunistic Meta-Learning Scheduler (OMLS), which monitors system inactivity and calendar data. These mechanisms are mutually reinforcing: a refined policy generates better trajectories for skill synthesis, while richer skills provide higher-quality data for policy optimization. To prevent data contamination, a versioning mechanism separates support and query data. Built on a proxy-based architecture, MetaClaw scales to production-size LLMs without local GPUs. Experiments on MetaClaw-Bench and AutoResearchClaw show that skill-driven adaptation improves accuracy by up to 32% relative. The full pipeline advances Kimi-K2.5 accuracy from 21.4% to 40.6% and increases composite robustness by 18.3%. Code is available at https://github.com/aiming-lab/MetaClaw.
Abstract（参考訳）: 大規模言語モデル(LLM)エージェントは複雑なタスクにますます使用されるが、デプロイされたエージェントはしばしば静的のままであり、ユーザのニーズが進化するにつれて適応しない。これにより、継続的デリバリの必要性と、シフトするタスクの分散にマッチする機能のアップデートの必要性との間に緊張が生じます。 20以上のチャネルにわたる多様なワークロードを処理するOpenClawのようなプラットフォームでは、既存のメソッドは、知識を蒸留せずに生のトラジェクトリを保存するか、静的スキルライブラリをメンテナンスするか、あるいは再トレーニングのために破壊的なダウンタイムを必要とする。本稿では,基本的なLCMポリシと再利用可能な行動スキルのライブラリを共同で進化させるメタ学習フレームワークであるMetaClawを紹介する。 MetaClawには2つの補完メカニズムがある。スキル駆動の高速適応は、LSM進化器を介して障害軌跡を分析して、新しいスキルを合成し、ダウンタイムをゼロにする即時改善を可能にする。 Opportunistic Policy Optimizationは、クラウドLoRAファインチューニングとReinforcement Learning with a Process Reward Model (RL-PRM)を介して勾配ベースの更新を実行する。これはOMLS(Opportunistic Meta-Learning Scheduler)によってユーザ不活性ウィンドウ中にトリガされ、システム不活性とカレンダデータを監視する。これらのメカニズムは相互に強化されており、洗練されたポリシーはスキル合成のためのより良い軌道を生成する一方、リッチなスキルはポリシー最適化のための高品質なデータを提供する。データ汚染を防止するため、バージョニング機構は、サポートとクエリデータを分離する。プロキシベースのアーキテクチャに基づいて構築されたMetaClawは、ローカルGPUを使わずに、プロダクションサイズのLLMにスケールする。 MetaClaw-BenchとAutoResearchClawの実験によると、スキル駆動型適応は、最大で32%の精度向上を実現している。パイプライン全体の精度は21.4%から40.6%に向上し、複合ロバスト性は18.3%向上した。コードはhttps://github.com/aiming-lab/MetaClaw.comで入手できる。

論文の概要: MetaClaw: Just Talk -- An Agent That Meta-Learns and Evolves in the Wild

関連論文リスト