Fugu-MT 論文翻訳(概要): Rewind-IL: Online Failure Detection and State Respawning for Imitation Learning

論文の概要: Rewind-IL: Online Failure Detection and State Respawning for Imitation Learning

arxiv url: http://arxiv.org/abs/2604.16683v1
Date: Fri, 17 Apr 2026 20:41:14 GMT
ステータス: 翻訳完了
システム内更新日: 2026-04-21 21:52:52.127744
Title: Rewind-IL: Online Failure Detection and State Respawning for Imitation Learning
Title（参考訳）: Rewind-IL:Imitation Learningのためのオンライン障害検出と状態再起動
Authors: Gehan Zheng, Sanjay Seenivasan, Matthew Johnson-Roberson, Weiming Zhi,
Abstract要約: 提案するRewind-ILは,生成アクションチャンク模倣ポリシーのためのトレーニングフリーオンラインセーフガードフレームワークである。 Rewind-ILは、TIDE(Temporal Inter-chunk Discrepancy Estimate)に基づくゼロショット故障検知器と、状態再起動機構を結合する。オンラインのRewind-ILは、重複するアクションチャンクの自己整合性を監視し、チェックポイントライブラリと類似性を追跡し、失敗すると、実行を最新の検証された安全な状態に戻す。
参考スコア（独自算出の注目度）: 7.445072780282545
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Imitation learning has enabled robots to acquire complex visuomotor manipulation skills from demonstrations, but deployment failures remain a major obstacle, especially for long-horizon action-chunked policies. Once execution drifts off the demonstration manifold, these policies often continue producing locally plausible actions without recovering from the failure. Existing runtime monitors either require failure data, over-trigger under benign feature drift, or stop at failure detection without providing a recovery mechanism. We present Rewind-IL, a training-free online safeguard framework for generative action-chunked imitation policies. Rewind-IL combines a zero-shot failure detector based on Temporal Inter-chunk Discrepancy Estimate (TIDE), calibrated with split conformal prediction, with a state-respawning mechanism that returns the robot to a semantically verified safe intermediate state. Offline, a vision-language model identifies recovery checkpoints in demonstrations, and the frozen policy encoder is used to construct a compact checkpoint feature database. Online, Rewind-IL monitors self-consistency in overlapping action chunks, tracks similarity to the checkpoint library, and, upon failure, rewinds execution to the latest verified safe state before restarting inference from a clean policy state. Experiments on real-world and simulated long-horizon manipulation tasks, including transfer to flow-matching action-chunked policies, demonstrate that policy-internal consistency coupled with semantically grounded respawning offers a practical route to improved reliability in imitation learning. Supplemental materials are available at https://sjay05.github.io/rewind-il
Abstract（参考訳）: 模倣学習は、ロボットがデモから複雑な視覚運動の操作スキルを習得することを可能にするが、特に長距離アクションチャンクポリシーでは、デプロイメントの失敗が大きな障害となっている。一度実行が実演多様体から退避すると、これらのポリシーは失敗から回復することなく、しばしば局所的に妥当なアクションを生成し続ける。既存のランタイムモニタでは、障害データ、良質な機能ドリフト下でのオーバートリガー、あるいはリカバリメカニズムを提供することなく障害検出を停止する必要がある。提案するRewind-ILは,生成アクションチャンク模倣ポリシーのためのトレーニングフリーオンラインセーフガードフレームワークである。 Rewind-ILは、TIDE(Temporal Inter-chunk Discrepancy Estimate)に基づくゼロショット故障検知器を、分割された共形予測で調整し、ロボットをセマンティックに検証された安全な中間状態に戻す状態再起動機構と組み合わせる。オフラインでは、視覚言語モデルがデモ中のリカバリチェックポイントを識別し、凍結ポリシエンコーダを使用してコンパクトなチェックポイント特徴データベースを構築する。オンラインのRewind-ILは、重複するアクションチャンクの自己一貫性を監視し、チェックポイントライブラリと類似性を追跡し、失敗すると、クリーンなポリシー状態から推論を再起動する前に、実行を最新の検証された安全な状態に戻す。実世界の実験と、フローマッチングアクションチョークされたポリシーへの移行を含む、シミュレーションされたロングホライゾン操作タスクの実験は、ポリシーと内部の一貫性が意味論的に根ざした再起動と組み合わせることで、模倣学習の信頼性を向上させるための実践的な方法が提供されることを実証している。補足資料はhttps://sjay05.github.io/rewind-ilで入手できる。

論文の概要: Rewind-IL: Online Failure Detection and State Respawning for Imitation Learning

関連論文リスト