Fugu-MT 論文翻訳(概要): Non-Adversarial Imitation Learning Provably Free of Compounding Errors: The Role of Bellman Constraints

論文の概要: Non-Adversarial Imitation Learning Provably Free of Compounding Errors: The Role of Bellman Constraints

arxiv url: http://arxiv.org/abs/2603.22713v1
Date: Tue, 24 Mar 2026 02:06:31 GMT
ステータス: 翻訳完了
システム内更新日: 2026-03-25 19:53:37.249996
Title: Non-Adversarial Imitation Learning Provably Free of Compounding Errors: The Role of Bellman Constraints
Title（参考訳）: 複合誤差のない非対人模倣学習:ベルマン制約の役割
Authors: Tian Xu, Chenyang Wang, Xiaochen Zhai, Ziniu Li, Yi-Chen Li, Yang Yu,
Abstract要約: AIL(Adversarial mimicion Learning)は,行動クローニング(BC)における複合的誤りを軽減し,高品質な模倣を実現する本稿では IQ-Learn を再検討し、それが BC に顕著に減少し、地平線上の二次的依存の低い模擬ギャップに苦しむことを示した。そこで本研究では,新しいQ-based IL法であるDual Q-DM(Dual Q-DM)を提案する。
参考スコア（独自算出の注目度）: 19.446845699075784
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: Adversarial imitation learning (AIL) achieves high-quality imitation by mitigating compounding errors in behavioral cloning (BC), but often exhibits training instability due to adversarial optimization. To avoid this issue, a class of non-adversarial Q-based imitation learning (IL) methods, represented by IQ-Learn, has emerged and is widely believed to outperform BC by leveraging online environment interactions. However, this paper revisits IQ-Learn and demonstrates that it provably reduces to BC and suffers from an imitation gap lower bound with quadratic dependence on horizon, therefore still suffering from compounding errors. Theoretical analysis reveals that, despite using online interactions, IQ-Learn uniformly suppresses the Q-values for all actions on states uncovered by demonstrations, thereby failing to generalize. To address this limitation, we introduce a primal-dual framework for distribution matching, yielding a new Q-based IL method, Dual Q-DM. The key mechanism in Dual Q-DM is incorporating Bellman constraints to propagate high Q-values from visited states to unvisited ones, thereby achieving generalization beyond demonstrations. We prove that Dual Q-DM is equivalent to AIL and can recover expert actions beyond demonstrations, thereby mitigating compounding errors. To the best of our knowledge, Dual Q-DM is the first non-adversarial IL method that is theoretically guaranteed to eliminate compounding errors. Experimental results further corroborate our theoretical results.
Abstract（参考訳）: 逆模倣学習(AIL)は,行動クローニング(BC)における複合的誤りを軽減し,高品質な模倣を実現する。この問題を回避するために、IQ-Learnで表される非逆Q型模倣学習(IL)のクラスが出現し、オンライン環境相互作用を活用してBCより優れていると広く信じられている。しかし,本論文ではIQ-Learnを再検討し,BCGがBCに顕著に減少し,地平線上の二次的依存に拘束された模擬ギャップに悩まされていることを示し,なおも混合誤差に悩まされている。理論的分析により、IQ-Learnはオンライン相互作用を用いても、デモンストレーションによって発見された状態に対する全てのアクションに対するQ値を均一に抑制し、一般化に失敗することが明らかになった。この制限に対処するため、分布マッチングのための原始双対フレームワークを導入し、新しいQ-based IL法であるDual Q-DMを提案する。デュアルQ-DMの鍵となるメカニズムはベルマンの制約を取り入れ、訪問状態から目に見えない状態への高いQ値の伝播を図り、実演を超えて一般化を達成することである。我々は、Dual Q-DMがAILと等価であることを証明するとともに、実演以外の専門家のアクションを復元し、複合的なエラーを軽減できることを示す。我々の知る限りでは、Dual Q-DMは、複雑なエラーを排除して理論的に保証される最初の非敵対的IL法である。実験結果はさらに理論結果の裏付けとなる。

論文の概要: Non-Adversarial Imitation Learning Provably Free of Compounding Errors: The Role of Bellman Constraints

関連論文リスト