Fugu-MT 論文翻訳(概要): World Model Implanting for Test-time Adaptation of Embodied Agents

論文の概要: World Model Implanting for Test-time Adaptation of Embodied Agents

arxiv url: http://arxiv.org/abs/2509.03956v1
Date: Thu, 04 Sep 2025 07:32:16 GMT
ステータス: 翻訳完了
システム内更新日: 2025-09-05 20:21:10.088838
Title: World Model Implanting for Test-time Adaptation of Embodied Agents
Title（参考訳）: 人工呼吸器の試験時間適応のための世界モデルインプラント
Authors: Minjong Yoo, Jinwoo Jang, Sihyung Yoon, Honguk Woo,
Abstract要約: 具体的AIにおいて、永続的な課題は、エージェントが広範なデータ収集や再トレーニングを必要とせずに、新しいドメインへの堅牢な適応を可能にすることである。本稿では、大規模言語モデルの推論能力と、独立に学習されたドメイン固有世界モデルを組み合わせた世界モデル埋め込みフレームワーク(WorMI)を提案する。我々は、VirtualHomeとALFWorldのベンチマークでWorMIを評価し、いくつかのLSMベースのアプローチと比較して、ゼロショットと少数ショットのパフォーマンスが優れていることを示した。
参考スコア（独自算出の注目度）: 29.514831254621438
License: http://creativecommons.org/licenses/by/4.0/
Abstract: In embodied AI, a persistent challenge is enabling agents to robustly adapt to novel domains without requiring extensive data collection or retraining. To address this, we present a world model implanting framework (WorMI) that combines the reasoning capabilities of large language models (LLMs) with independently learned, domain-specific world models through test-time composition. By allowing seamless implantation and removal of the world models, the embodied agent's policy achieves and maintains cross-domain adaptability. In the WorMI framework, we employ a prototype-based world model retrieval approach, utilizing efficient trajectory-based abstract representation matching, to incorporate relevant models into test-time composition. We also develop a world-wise compound attention method that not only integrates the knowledge from the retrieved world models but also aligns their intermediate representations with the reasoning model's representation within the agent's policy. This framework design effectively fuses domain-specific knowledge from multiple world models, ensuring robust adaptation to unseen domains. We evaluate our WorMI on the VirtualHome and ALFWorld benchmarks, demonstrating superior zero-shot and few-shot performance compared to several LLM-based approaches across a range of unseen domains. These results highlight the frameworks potential for scalable, real-world deployment in embodied agent scenarios where adaptability and data efficiency are essential.
Abstract（参考訳）: 具体的AIにおいて、永続的な課題は、エージェントが広範なデータ収集や再トレーニングを必要とせずに、新しいドメインへの堅牢な適応を可能にすることである。そこで本研究では,大規模言語モデル (LLM) の推論能力と,テスト時間合成による独立に学習されたドメイン固有世界モデルを組み合わせた世界モデル埋め込みフレームワーク (WorMI) を提案する。世界モデルのシームレスな埋め込みと除去を可能にすることで、エンボディエージェントのポリシーはドメイン間の適合性を達成し維持する。 WorMIフレームワークでは、効率的な軌道に基づく抽象表現マッチングを利用してプロトタイプベースの世界モデル検索手法を用いて、関連するモデルをテスト時間構成に組み込む。また,検索した世界モデルからの知識を統合するだけでなく,その中間表現をエージェントのポリシー内での推論モデルの表現と整合させる,世界規模の複合的注意法も開発している。このフレームワーク設計は、複数の世界モデルからドメイン固有の知識を効果的に融合させ、目に見えないドメインへの堅牢な適応を保証する。我々はVirtualHomeとALFWorldのベンチマークでWorMIを評価し、さまざまな未確認領域にわたるLSMベースのアプローチと比較して、ゼロショットと少数ショットのパフォーマンスが優れていることを示した。これらの結果は、適応性とデータ効率が不可欠である実施されたエージェントシナリオにおいて、スケーラブルで実世界のデプロイが可能なフレームワークを強調している。

論文の概要: World Model Implanting for Test-time Adaptation of Embodied Agents

関連論文リスト