Fugu-MT 論文翻訳(概要): Claw-R1: A Step-Level Data Middleware System for Agentic Reinforcement Learning

論文の概要: Claw-R1: A Step-Level Data Middleware System for Agentic Reinforcement Learning

arxiv url: http://arxiv.org/abs/2606.09138v1
Date: Mon, 08 Jun 2026 07:35:18 GMT
ステータス: 翻訳完了
システム内更新日: 2026-06-09 14:42:06.803621
Title: Claw-R1: A Step-Level Data Middleware System for Agentic Reinforcement Learning
Title（参考訳）: Claw-R1:エージェント強化学習のためのステップレベルデータミドルウェアシステム
Authors: Daoyu Wang, Mingyue Cheng, Qingchuan Li, Shuo Yu, Jie Ouyang, Qi Liu,
Abstract要約: 本稿ではエージェントRLのための対話型ステップレベルデータシステムであるClaw-R1を提案する。 Claw-R1は、異種エージェントランタイムとRLトレーニングバックエンドを2つのコアコンポーネントを介して接続する。デモでは、ユーザはインタラクティブにライブの軌跡を検査し、各ステップの状態、動作、報酬を調べ、品質と準備性によってデータをキュレートし、トレーニング可能なバッチを設定することができる。
参考スコア（独自算出の注目度）: 9.576383475538606
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Agentic reinforcement learning (RL) has become an important post-training paradigm for turning LLMs from static chatbots into interactive agents, giving rise to representative applications such as OpenClaw. Existing work mainly focuses on policy optimization algorithms and training frameworks, but pays less attention to the full data lifecycle of agent-environment interactions, from data production to training consumption. To bridge this gap, we present Claw-R1, an interactive step-level data middleware system for agentic RL. Claw-R1 connects heterogeneous agent runtimes with RL training backends through two core components: a Gateway Server and a Data Pool. The Gateway Server captures multi-turn interaction steps through a unified LLM API entry point, while the Data Pool organizes them into step-level records consisting of prompt IDs, response IDs, rewards and other metadata. In our demo, users can interactively inspect live trajectories, examine the state, action, and reward of each step, curate data by quality and readiness, and configure training-ready batches for different downstream RL algorithms. Overall, Claw-R1 treats agent interaction traces as managed data assets rather than temporary runtime logs. Through this demonstration, we hope to encourage the community to recognize the importance of data management in agentic RL. Our code is available at https://github.com/AgentR1/Claw-R1 and the demonstration video can be found at link https://youtu.be/Pw47dAOw6B0.
Abstract（参考訳）: エージェント強化学習(RL)は、静的チャットボットから対話型エージェントにLLMを変換するためのトレーニング後の重要なパラダイムとなり、OpenClawのような代表的アプリケーションを生み出している。既存の作業は主にポリシー最適化アルゴリズムとトレーニングフレームワークに重点を置いているが、データ生産からトレーニング消費まで、エージェント環境相互作用の完全なデータライフサイクルにはあまり注意を払わない。このギャップを埋めるために,エージェントRLのためのインタラクティブなステップレベルデータミドルウェアシステムであるClaw-R1を提案する。 Claw-R1は、ゲートウェイサーバとデータプールという2つのコアコンポーネントを通じて、異種エージェントランタイムとRLトレーニングバックエンドを接続する。 Gateway Serverは、統一LLM APIエントリポイントを通じてマルチターンインタラクションステップをキャプチャし、Data Poolは、プロンプトID、レスポンスID、報酬、その他のメタデータからなるステップレベルのレコードにそれらを整理する。デモでは、ユーザはインタラクティブにライブトラジェクトリを検査し、各ステップの状態、動作、報酬を調べ、品質と準備性でデータをキュレートし、異なる下流RLアルゴリズムでトレーニング可能なバッチを設定することができる。全体として、Claw-R1はエージェントのインタラクショントレースを一時的な実行時ログではなく、マネージドデータアセットとして扱う。このデモンストレーションを通じて、エージェントRLにおけるデータ管理の重要性をコミュニティに認識させたいと思っています。私たちのコードはhttps://github.com/AgentR1/Claw-R1で公開されています。

論文の概要: Claw-R1: A Step-Level Data Middleware System for Agentic Reinforcement Learning

関連論文リスト