Fugu-MT 論文翻訳(概要): TIGeR: A Unified Framework for Time, Images and Geo-location Retrieval

論文の概要: TIGeR: A Unified Framework for Time, Images and Geo-location Retrieval

arxiv url: http://arxiv.org/abs/2603.24749v1
Date: Wed, 25 Mar 2026 19:20:48 GMT
ステータス: 翻訳完了
システム内更新日: 2026-03-27 20:52:47.956718
Title: TIGeR: A Unified Framework for Time, Images and Geo-location Retrieval
Title（参考訳）: TIGeR: 時間、画像、位置情報検索のための統一フレームワーク
Authors: David G. Shatwell, Sirnam Swetha, Mubarak Shah,
Abstract要約: デジタル法医学、都市モニタリング、環境分析における現実世界の応用には、視覚的外観、位置、時間に関する共同推論が必要である。我々は、この問題をGeo-Time Aware Image Retrievalとして定式化し、トレーニング用の4.5Mペアイメージロケーションタイムトリップレットと、評価用の86k高品質トリップレットのベンチマークをキュレートする。次に、画像、位置情報、時刻を統合された時空間埋め込み空間にマッピングするマルチモーダルトランスフォーマーモデルTIGeRを提案する。
参考スコア（独自算出の注目度）: 47.16110829725784
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Many real-world applications in digital forensics, urban monitoring, and environmental analysis require jointly reasoning about visual appearance, geolocation, and time. Beyond standard geo-localization and time-of-capture prediction, these applications increasingly demand more complex capabilities, such as retrieving an image captured at the same location as a query image but at a specified target time. We formalize this problem as Geo-Time Aware Image Retrieval and curate a diverse benchmark of 4.5M paired image-location-time triplets for training and 86k high-quality triplets for evaluation. We then propose TIGeR, a multi-modal-transformer-based model that maps image, geolocation, and time into a unified geo-temporal embedding space. TIGeR supports flexible input configurations (single-modality and multi-modality queries) and uses the same representation to perform (i) geo-localization, (ii) time-of-capture prediction, and (iii) geo-time-aware retrieval. By better preserving underlying location identity under large appearance changes, TIGeR enables retrieval based on where and when a scene is, rather than purely on visual similarity. Extensive experiments show that TIGeR consistently outperforms strong baselines and state-of-the-art methods by up to 16% on time-of-year, 8% time-of-day prediction, and 14% in geo-time aware retrieval recall, highlighting the benefits of unified geo-temporal modeling.
Abstract（参考訳）: デジタル法医学、都市モニタリング、環境分析における多くの現実世界の応用は、視覚的外観、位置、時間について共同で推論する必要がある。標準的な地理的ローカライゼーションとキャプチャの時間予測以外にも、これらのアプリケーションは、クエリ画像と同じ場所でキャプチャされたイメージを、指定されたターゲット時間で取得するなど、より複雑な機能を要求するようになっている。我々は、この問題をGeo-Time Aware Image Retrievalとして定式化し、トレーニング用4.5Mペア画像位置時三重項と評価用86k高品質三重項を多種多様なベンチマークで評価する。次に、画像、位置情報、時刻を統合された時空間埋め込み空間にマッピングするマルチモーダルトランスフォーマーモデルTIGeRを提案する。 TIGeRはフレキシブルな入力構成(単一モダリティとマルチモダリティクエリ)をサポートし、同じ表現を使用して実行する。 (i)ジオローカライゼーション、 (二)捕獲時期予測、及び (iii)ジオタイム・アウェア検索。 TIGeRは、大きな外観変化の下で位置識別をよりよく保存することにより、視覚的類似性ではなく、シーンがどこにあるか、いつあるかに基づいた検索を可能にする。大規模な実験により、TIGeRは強いベースラインと最先端の手法を最大16%、時間予測8%、地理的に認識されたリコールリコールの14%で一貫したパフォーマンスを示し、統合された時空間モデリングの利点を強調している。

論文の概要: TIGeR: A Unified Framework for Time, Images and Geo-location Retrieval

関連論文リスト