Fugu-MT 論文翻訳(概要): Vintix II: Decision Pre-Trained Transformer is a Scalable In-Context Reinforcement Learner

論文の概要: Vintix II: Decision Pre-Trained Transformer is a Scalable In-Context Reinforcement Learner

arxiv url: http://arxiv.org/abs/2604.05112v1
Date: Mon, 06 Apr 2026 19:18:12 GMT
ステータス: 翻訳完了
システム内更新日: 2026-04-08 17:42:09.458495
Title: Vintix II: Decision Pre-Trained Transformer is a Scalable In-Context Reinforcement Learner
Title（参考訳）: Vintix II: 拡張性のあるインコンテキスト強化学習者である決定事前学習変換器
Authors: Andrei Polubarov, Lyubaykin Nikita, Alexander Derevyagin, Artyom Grishin, Igor Saprygin, Aleksandr Serkov, Mark Averchenko, Daniil Tikhonov, Maksim Zhdanov, Alexander Nikulin, Ilya Zisman, Albina Klepach, Alexey Zemtsov, Vladislav Kurenkov,
Abstract要約: インコンテキスト強化学習は、推論時に新しいタスクを直接取得できるエージェントを訓練するために使用することができる。 DPT(Decision Pre-Trained Transformer)が導入された。我々はDPTを多様なマルチドメイン環境に拡張し、フローマッチングを自然なトレーニング選択として適用する。
参考スコア（独自算出の注目度）: 91.12249411043723
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Recent progress in in-context reinforcement learning (ICRL) has demonstrated its potential for training generalist agents that can acquire new tasks directly at inference. Algorithm Distillation (AD) pioneered this paradigm and was subsequently scaled to multi-domain settings, although its ability to generalize to unseen tasks remained limited. The Decision Pre-Trained Transformer (DPT) was introduced as an alternative, showing stronger in-context reinforcement learning abilities in simplified domains, but its scalability had not been established. In this work, we extend DPT to diverse multi-domain environments, applying Flow Matching as a natural training choice that preserves its interpretation as Bayesian posterior sampling. As a result, we obtain an agent trained across hundreds of diverse tasks that achieves clear gains in generalization to the held-out test set. This agent improves upon prior AD scaling and demonstrates stronger performance in both online and offline inference, reinforcing ICRL as a viable alternative to expert distillation for training generalist agents.
Abstract（参考訳）: インコンテキスト強化学習(ICRL)の最近の進歩は、推論時に新しいタスクを直接取得できるジェネリストエージェントの訓練の可能性を示している。アルゴリズム蒸留(AD)はこのパラダイムの先駆者であり、後にマルチドメイン設定に拡張されたが、目に見えないタスクに一般化する能力は限られていた。 Decision Pre-Trained Transformer (DPT) が導入されたが、拡張性は確立されていない。本研究では,DPTを多様なマルチドメイン環境に拡張し,フローマッチングを自然学習選択として応用し,ベイズ的後続サンプリングとして解釈する。その結果、何百もの多様なタスクにまたがって訓練されたエージェントが、ホールドアウトテストセットへの一般化において明らかな利益を得ることができた。このエージェントは、ADの事前スケーリングを改善し、オンラインおよびオフラインの推論においてより強力な性能を示し、ICRLをジェネラリストエージェントの訓練のためのエキスパート蒸留の代替品として強化する。

論文の概要: Vintix II: Decision Pre-Trained Transformer is a Scalable In-Context Reinforcement Learner

関連論文リスト