Fugu-MT 論文翻訳(概要): From Imitation to Alignment: Human-Preference Flow Policies for Long-Horizon Sidewalk Navigation

論文の概要: From Imitation to Alignment: Human-Preference Flow Policies for Long-Horizon Sidewalk Navigation

arxiv url: http://arxiv.org/abs/2606.12603v1
Date: Wed, 10 Jun 2026 19:01:31 GMT
ステータス: 翻訳完了
システム内更新日: 2026-06-12 15:55:27.416373
Title: From Imitation to Alignment: Human-Preference Flow Policies for Long-Horizon Sidewalk Navigation
Title（参考訳）: 模倣からアライメントへ:長期横断歩道ナビゲーションのための人為的なフローポリシー
Authors: Honglin He, Zhizheng Liu, Yukai Ma, Bolei Zhou,
Abstract要約: ロボットフードデリバリーや補助電動車椅子などのマイクロモビリティ応用には,長距離歩道ナビゲーションが不可欠である。本稿では,単眼のRGBカメラのみを用いて,堅牢で効率的な長距離ナビゲーション性能を実現するマップレスナビゲーションポリシーであるFlowPilotを紹介する。本研究では,様々な歩道環境における広範囲なシミュレーションと実世界の実験を通してFlowPilotを評価する。
参考スコア（独自算出の注目度）: 34.965299382808126
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Autonomous long-horizon sidewalk navigation is essential for micro-mobility applications such as robotic food delivery and assistive electronic wheelchairs. Unlike autonomous driving on the road, long-horizon sidewalk navigation requires precise maneuvering through unpredictable sidewalk terrains and pedestrians, with a lightweight perception stack as minimal as a single monocular RGB camera. While imitation learning (IL) from demonstrations offers a practical solution, the resulting autopilot policy often suffers from compounding errors, a lack of social compliance on sidewalks, and deficiencies in counterfactual reasoning to handle complex situations. To address these challenges, we introduce FlowPilot, a mapless navigation policy that achieves robust and efficient long-horizon navigation performance using only a monocular RGB camera. We first propose to use anchored flow matching as an action representation for policy pre-training on large-scale robot fleet data and to capture the diverse, complex, multimodal distribution of sidewalk navigation behaviors. To bridge the gap between imitation and alignment, we further design a human-in-the-loop preference learning scheme to tune the policy on a small amount of human intervention data. It strengthens the model's counterfactual reasoning and social compliance on sidewalks. We evaluate FlowPilot through extensive simulation and real-world experiments in diverse sidewalk environments. FlowPilot achieves 42% success rate and 66% route completion in simulation, while FlowPilot-HP further improves real-world robustness and social compliance, reducing IR by 40.0% and NIR by 52.1% relative to the base model.
Abstract（参考訳）: 自律型長距離歩道ナビゲーションは、ロボットフードデリバリーや補助電動車椅子などのマイクロモビリティ応用に不可欠である。道路上の自動運転とは異なり、長い水平歩道のナビゲーションは予測できない歩道の地形や歩行者を正確に操作する必要がある。実演からの模倣学習(IL)は実践的な解決策を提供するが、結果として生じる自動操縦ポリシーは、複雑なエラー、歩道における社会的コンプライアンスの欠如、複雑な状況に対処する反実的推論の欠陥に悩まされることが多い。これらの課題に対処するために、単眼のRGBカメラのみを使用して、堅牢で効率的な長距離ナビゲーション性能を実現するマップレスナビゲーションポリシーであるFlowPilotを導入する。まず,大規模ロボット艦隊データに基づく政策事前学習のための行動表現としてアンカードフローマッチングを用い,歩道ナビゲーション行動の多様で複雑なマルチモーダル分布を捉えることを提案する。模倣とアライメントのギャップを埋めるために、我々は、少量の人間の介入データに基づいてポリシーを調整するための、ループ内での嗜好学習スキームをさらに設計する。これは、歩道におけるモデルの反事実的推論と社会的コンプライアンスを強化する。本研究では,様々な歩道環境における広範囲なシミュレーションと実世界の実験を通してFlowPilotを評価する。 FlowPilotは42%の成功率と66%のルート完了を実現し、FlowPilot-HPは現実世界の堅牢性と社会的コンプライアンスをさらに改善し、IRを40.0%、NIRを52.1%削減した。

論文の概要: From Imitation to Alignment: Human-Preference Flow Policies for Long-Horizon Sidewalk Navigation

関連論文リスト