Fugu-MT 論文翻訳(概要): StreamingClaw Technical Report

論文の概要: StreamingClaw Technical Report

arxiv url: http://arxiv.org/abs/2603.22120v2
Date: Thu, 26 Mar 2026 11:06:17 GMT
ステータス: 翻訳完了
システム内更新日: 2026-03-27 13:32:29.876152
Title: StreamingClaw Technical Report
Title（参考訳）: StreamingClawテクニカルレポート
Authors: Jiawei Chen, Zhe Chen, Chaoqun Du, Maokui He, Wei He, Hengtao Li, Qizhen Li, Zide Liu, Hao Ma, Xuhao Pan, Chang Ren, Xudong Rao, Xintian Shen, Chenfeng Wang, Tao Wei, Chengjun Yu, Pengfei Yu, Shengyu Yao, Chunpeng Zhou, Kun Zhan, Lihao Zheng, Pan Zhou, Xuhan Zhu, Yufei Zheng,
Abstract要約: StreamingClawは、ビデオ理解とインテリジェンスをストリーミングするフレームワークである。リアルタイムストリーミングの推論、将来のイベントの推論、アクティブなインタラクションをサポートする。また、現実世界の物理的な環境に合わせて、ストリーミングツールとアクション中心のスキルを提供する。
参考スコア（独自算出の注目度）: 34.71973506764889
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Emerging applications such as embodied intelligence, AI hardware, autonomous driving, and intelligent cockpits rely on a real-time perception-decision-action closed loop, posing stringent challenges for streaming video understanding. However, current agents mostly suffer from fragmented capabilities, such as supporting only offline video understanding, lacking long-term multimodal memory mechanisms, or struggling to achieve real-time reasoning and proactive interaction under streaming input. These shortcomings have become a key bottleneck for preventing agents from sustaining perception, making real-time decisions, and executing closed-loop actions in complex real-world environments, constraining their deployment and potential in dynamic, open physical worlds. To alleviate these issues, we propose StreamingClaw, a unified agent framework for streaming video understanding and embodied intelligence. Beyond maintaining full compatibility with the OpenClaw framework, it natively supports real-time, multimodal streaming interactions. StreamingClaw integrates five core capabilities: (1) It supports real-time streaming reasoning. (2) It supports reasoning about future events and proactive interaction under the online evolution of interaction objectives. (3) It supports multimodal long-term memory storage, hierarchical memory evolution, efficient memory retrieval, and memory sharing across multiple agents. (4) It supports a closed loop of perception-decision-action. In addition to conventional tools and skills, it also provides streaming tools and action-centric skills tailored for real-world physical environments. (5) It is compatible with the OpenClaw framework, allowing it to leverage the resources and support of the open-source community.
Abstract（参考訳）: 組み込みインテリジェンス、AIハードウェア、自律運転、インテリジェントコックピットといった新興アプリケーションは、リアルタイムの知覚-決定-動作のクローズループに依存しており、ストリーミングビデオ理解に厳しい課題を呈している。しかし、現在のエージェントは、オフラインビデオ理解のみをサポートすること、長期のマルチモーダルメモリ機構の欠如、リアルタイム推論とストリーミング入力下でのアクティブなインタラクションの達成に苦慮している。これらの欠点は、エージェントが知覚を持続させ、リアルタイムな決定を行い、複雑な現実世界環境でクローズドループアクションを実行し、その展開と、ダイナミックでオープンな物理的な世界での可能性を制限する上で、重要なボトルネックとなっている。これらの問題を緩和するために,ビデオ理解とインテリジェンスをストリーミングするための統合エージェントフレームワークStreamingClawを提案する。 OpenClawフレームワークとの完全な互換性を維持するだけでなく、リアルタイムでマルチモーダルなストリーミングインタラクションもネイティブにサポートする。 StreamingClawは5つのコア機能を統合している。 2)対話目的のオンライン進化における今後の出来事の推論と積極的相互作用を支援する。 (3)マルチモーダルな長期記憶ストレージ、階層記憶の進化、効率的なメモリ検索、複数のエージェント間でのメモリ共有をサポートする。 (4) 知覚-決定-行動の閉ループを支持する。従来のツールやスキルに加えて、実際の物理的な環境に適したストリーミングツールやアクション中心のスキルも提供する。 (5) OpenClawフレームワークと互換性があり、オープンソースコミュニティのリソースとサポートを活用することができる。

論文の概要: StreamingClaw Technical Report

関連論文リスト