Fugu-MT 論文翻訳(概要): Benchmarking the Limits of In-Context Reinforcement Learning for Ad-Hoc Teamwork

論文の概要: Benchmarking the Limits of In-Context Reinforcement Learning for Ad-Hoc Teamwork

arxiv url: http://arxiv.org/abs/2605.24423v1
Date: Sat, 23 May 2026 06:39:21 GMT
ステータス: 翻訳完了
システム内更新日: 2026-05-26 19:50:18.049144
Title: Benchmarking the Limits of In-Context Reinforcement Learning for Ad-Hoc Teamwork
Title（参考訳）: アドホックチームワークにおけるインコンテキスト強化学習の限界のベンチマーク
Authors: Yuheng Jing, Kai Li, Ziwen Zhang, Jiajun Zhang, Zeyao Ma, Jiaxi Yang, Lei Zhang, Zhe Wu, Jinmin He, Junliang Xing, Jian Cheng,
Abstract要約: In-Context Reinforcement Learning (ICRL)は、ファンデーションエージェントが新しいタスクに即時に適応することを可能にするが、Ad-Hoc Teamwork (AHT)において、未知のパートナとの協調が不要な場合、その有効性は未検討のままである。本稿では,Overcooked-V2 の高スループット JAX 実装をベースに構築された大規模ベンチマーク ICRL4AHT を紹介する。我々は, アルゴリズム蒸留 (AD) やDPT (Decision-Pretrained Transformer) など, 数百万の遷移にまたがる代表的履歴条件ICRLアルゴリズムを評価する。
参考スコア（独自算出の注目度）: 45.63941874462679
License: http://creativecommons.org/licenses/by/4.0/
Abstract: In-Context Reinforcement Learning (ICRL) has enabled foundation agents to adapt instantaneously to novel tasks, yet its efficacy in Ad-Hoc Teamwork (AHT)-where coordination with unknown partners is required-remains unexplored. To rigorously evaluate this, we introduce a large-scale benchmark ICRL4AHT, built upon a high-throughput JAX implementation of Overcooked-V2. Our benchmark includes a large, diverse teammate suite spanning both RL and heuristic policies, enabling controlled train-test shifts, and provides a reproducible end-to-end pipeline for teammate generation, learning-history collection, dataset construction, and online multi-episode evaluation. We evaluate representative history-conditioned ICRL algorithms, including Algorithm Distillation (AD) and Decision-Pretrained Transformer (DPT), across millions of transitions. Results reveal notable limitations: contrary to their success in single-agent domains, these baselines fail to exhibit robust test-time adaptation in multi-agent settings. Specifically, these methods frequently underperform random baselines across both unseen teammate and unseen layout tracks, with no clear in-context improvement over long horizons. These findings highlight the challenges of strategic inference under partial observability within the OvercookedV2 AHT protocol, establishing our benchmark as a critical testbed for next-generation coordination algorithms.
Abstract（参考訳）: In-Context Reinforcement Learning (ICRL)は、ファンデーションエージェントが新しいタスクに即時に適応することを可能にするが、Ad-Hoc Teamwork (AHT)において、未知のパートナとの協調が不要な場合、その有効性は未検討のままである。これを厳格に評価するために,Overcooked-V2 の高スループット JAX 実装をベースに構築された大規模ベンチマーク ICRL4AHT を導入する。我々のベンチマークには、RLとヒューリスティックポリシの両方にまたがる、多種多様なチームメイトスイートが含まれ、コントロールされたテストシフトを可能にし、チームメイト生成、学習履歴収集、データセット構築、オンラインマルチエピソード評価のための再現可能なエンドツーエンドパイプラインを提供する。我々は, アルゴリズム蒸留 (AD) やDPT (Decision-Pretrained Transformer) など, 数百万の遷移にまたがる代表的履歴条件ICRLアルゴリズムを評価する。シングルエージェントドメインの成功とは裏腹に、これらのベースラインはマルチエージェント設定で堅牢なテストタイム適応を示すことができません。具体的には、これらの手法は、目に見えないチームメイトと見えないレイアウトトラックの両方で、しばしばランダムなベースラインを過小評価する。これらの結果は,OvercookedV2 AHTプロトコルにおける部分観測可能性の下での戦略的推論の課題を浮き彫りにして,我々のベンチマークを次世代協調アルゴリズムの重要なテストベッドとして確立した。

論文の概要: Benchmarking the Limits of In-Context Reinforcement Learning for Ad-Hoc Teamwork

関連論文リスト