Fugu-MT 論文翻訳(概要): LookasideVLN: Direction-Aware Aerial Vision-and-Language Navigation

論文の概要: LookasideVLN: Direction-Aware Aerial Vision-and-Language Navigation

arxiv url: http://arxiv.org/abs/2604.17190v1
Date: Sun, 19 Apr 2026 01:36:53 GMT
ステータス: 翻訳完了
システム内更新日: 2026-04-21 21:52:52.382248
Title: LookasideVLN: Direction-Aware Aerial Vision-and-Language Navigation
Title（参考訳）: LookasideVLN: 航法と航法
Authors: Yuwei Ning, Ganlong Zhao, Yipeng Qin, Si Liu, Yang Liu, Liang Lin, Guanbin Li,
Abstract要約: LookasideVLNは、より正確な空間推論とより高い計算効率を達成するために、自然言語の方向の手がかりを利用する。 LookasideVLNは、シングルレベルのルックアヘッドでも、最先端のCityNavAgentよりも大幅に優れています。
参考スコア（独自算出の注目度）: 96.09246387639006
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Aerial Vision-and-Language Navigation (Aerial VLN) enables unmanned aerial vehicles (UAVs) to follow natural language instructions and navigate complex urban environments. While recent advances have achieved progress through large-scale memory graphs and lookahead path planning, they remain limited by shallow instruction understanding and high computational cost. In particular, existing methods rely primarily on landmark descriptions, overlooking directional cues "a key source of spatial context in human navigation". In this work, we propose LookasideVLN, a new paradigm that exploits directional cues in natural language to achieve both more accurate spatial reasoning and greater computational efficiency. LookasideVLN comprises three core components: (1) an Egocentric Lookaside Graph (ELG) that dynamically encodes instruction-relevant landmarks and their directional relationships, (2) a Spatial Landmark Knowledge Base (SLKB) that provides lightweight memory retrieval from prior navigation experiences, and (3) a Lookaside MLLM Navigation Agent that aligns multimodal information from user instructions, visual observations, and landmark-direction information from ELG for path planning. Extensive experiments show that LookasideVLN significantly outperforms the state-of-the-art CityNavAgent, even with a single-level lookahead, demonstrating that leveraging directional cues is a powerful yet efficient strategy for Aerial VLN.
Abstract（参考訳）: Aerial Vision-and-Language Navigation (Aerial VLN) は、無人航空機(UAV)が自然言語の指示に従い、複雑な都市環境をナビゲートすることを可能にする。近年の進歩は、大規模なメモリグラフやルックアヘッドパス計画を通じて進展しているが、浅い命令理解と高い計算コストによって制限されている。特に、既存の手法は主にランドマークの記述に依存しており、方向の手がかりを見渡すことは「人間のナビゲーションにおける空間的コンテキストの重要な源」である。本研究では,より正確な空間推論と計算効率の両立を実現するために,自然言語の指向性を利用した新しいパラダイムであるLookasideVLNを提案する。 LookasideVLNは,(1)指示関連ランドマークとその方向関係を動的にエンコードするEgocentric Lookaside Graph(ELG),(2)以前のナビゲーション体験から軽量なメモリ検索を提供するSpatial Landmark Knowledge Base(SLKB),(3)ユーザインストラクションからのマルチモーダル情報,視覚的観察,および経路計画のためのELGからのランドマーク指向情報を整列するLookaside MLLMナビゲーションエージェントの3つのコアコンポーネントから構成される。大規模な実験により、LookasideVLNは単一レベルのルックアヘッドでも最先端のCityNavAgentよりも優れており、指向性キューを活用することがAerial VLNにとって強力だが効率的な戦略であることを実証している。

論文の概要: LookasideVLN: Direction-Aware Aerial Vision-and-Language Navigation

関連論文リスト