Fugu-MT 論文翻訳(概要): Learning to Wander: Improving the Global Image Geolocation Ability of LMMs via Actionable Reasoning

論文の概要: Learning to Wander: Improving the Global Image Geolocation Ability of LMMs via Actionable Reasoning

arxiv url: http://arxiv.org/abs/2603.10463v1
Date: Wed, 11 Mar 2026 06:24:10 GMT
ステータス: 翻訳完了
システム内更新日: 2026-03-12 16:22:32.80817
Title: Learning to Wander: Improving the Global Image Geolocation Ability of LMMs via Actionable Reasoning
Title（参考訳）: Wanderへの学習: 行動的推論によるLMMのグローバルな画像測位能力の向上
Authors: Yushuo Zheng, Huiyu Duan, Zicheng Zhang, Xiaohong Liu, Xiongkuo Min,
Abstract要約: textbfWanderBenchは,具体的シナリオにおける行動可能な位置情報推論のための,最初のオープンアクセスグローバルジオロケーションベンチマークである。我々は,下線Action of UnderlineThoughを用いた下線Geolocationフレームワークである textbfGeoAoT (Action of Thought) を提案する。 19個の大規模マルチモーダルモデルによる実験により、GeoAoTは動的環境におけるより優れた微細な局所化とより強力な一般化を実現することが示された。
参考スコア（独自算出の注目度）: 72.13218601075958
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Geolocation, the task of identifying the geographic location of an image, requires abundant world knowledge and complex reasoning abilities. Though advanced large multimodal models (LMMs) have shown superior aforementioned capabilities, their performance on the geolocation task remains unexplored. To this end, we introduce \textbf{WanderBench}, the first open access global geolocation benchmark designed for actionable geolocation reasoning in embodied scenarios. WanderBench contains over 32K panoramas across six continents, organized as navigable graphs that enable physical actions such as rotation and movement, transforming geolocation from static recognition into interactive exploration. Building on this foundation, we propose \textbf{GeoAoT} (Action of Thought), a \underline{Geo}location framework with \underline{A}ction of \underline{T}hough, which couples reasoning with embodied actions. Instead of generating textual reasoning chains, GeoAoT produces actionable plans such as, approaching landmarks or adjusting viewpoints, to actively reduce uncertainty. We further establish an evaluation protocol that jointly measures geolocation accuracy and difficulty-aware geolocation questioning ability. Experiments on 19 large multimodal models show that GeoAoT achieves superior fine-grained localization and stronger generalization in dynamic environments. WanderBench and GeoAoT define a new paradigm for actionable, reasoning driven geolocation in embodied visual understanding.
Abstract（参考訳）: 画像の地理的位置を特定するタスクであるジオロケーションは、豊富な世界の知識と複雑な推論能力を必要とする。先進的な大規模マルチモーダルモデル (LMM) は, 上述の能力に優れるが, 位置決め作業における性能は未解明のままである。この目的のために,実装シナリオにおける行動可能な位置情報推論のために設計された,最初のオープンアクセスグローバルジオロケーションベンチマークである \textbf{WanderBench} を紹介する。 WanderBenchには6大陸にまたがる32K以上のパノラマが含まれており、回転や移動といった物理的な動作を可能にするナビゲート可能なグラフとして組織され、位置を静的な認識からインタラクティブな探索へと変換する。この基礎の上に構築された『textbf{GeoAoT} (Action of Thought) 』は,『Shaunderline{Geo}location framework 』と『Shaunderline{A}ction of \underline{T}hough 』が組み合わさったものである。テキスト推論チェーンを生成する代わりに、GeoAoTは、ランドマークに近づいたり、視点を調整するような実行可能なプランを作成し、不確実性を積極的に減少させる。さらに、位置情報の精度と難易度を共同で測定する評価プロトコルを確立する。 19個の大規模マルチモーダルモデルによる実験により、GeoAoTは動的環境におけるより優れた微細な局所化とより強力な一般化を実現することが示された。 WanderBenchとGeoAoTは、具体的視覚的理解において、行動可能な推論駆動の位置情報のための新しいパラダイムを定義している。

論文の概要: Learning to Wander: Improving the Global Image Geolocation Ability of LMMs via Actionable Reasoning

関連論文リスト