Learning a Dynamic Map of Visual Appearance
- URL: http://arxiv.org/abs/2012.14885v1
- Date: Tue, 29 Dec 2020 18:23:56 GMT
- Title: Learning a Dynamic Map of Visual Appearance
- Authors: Tawfiq Salem, Scott Workman, Nathan Jacobs
- Abstract summary: We propose to use billions of images to construct a global-scale, dynamic map of visual appearance attributes.
Our approach integrates dense overhead imagery with location and time metadata into a general framework capable of mapping a wide variety of visual attributes.
We demonstrate how this approach can support various applications, including image-driven mapping, image geolocalization, and metadata verification.
- Score: 33.428135914984445
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The appearance of the world varies dramatically not only from place to place
but also from hour to hour and month to month. Every day billions of images
capture this complex relationship, many of which are associated with precise
time and location metadata. We propose to use these images to construct a
global-scale, dynamic map of visual appearance attributes. Such a map enables
fine-grained understanding of the expected appearance at any geographic
location and time. Our approach integrates dense overhead imagery with location
and time metadata into a general framework capable of mapping a wide variety of
visual attributes. A key feature of our approach is that it requires no manual
data annotation. We demonstrate how this approach can support various
applications, including image-driven mapping, image geolocalization, and
metadata verification.
Related papers
- RefChartQA: Grounding Visual Answer on Chart Images through Instruction Tuning [63.599057862999]
RefChartQA is a novel benchmark that integrates Chart Question Answering (ChartQA) with visual grounding.
Our experiments demonstrate that incorporating spatial awareness via grounding improves response accuracy by over 15%.
arXiv Detail & Related papers (2025-03-29T15:50:08Z) - MapGlue: Multimodal Remote Sensing Image Matching [12.376931699274062]
Multimodal remote sensing image (MRSI) matching is pivotal for cross-modal fusion, localization, and object detection.
Existing unimodal datasets lack scale and diversity, limiting deep learning solutions.
This paper proposes MapGlue, a universal MRSI matching framework, and MapData, a large-scale multimodal dataset addressing these gaps.
arXiv Detail & Related papers (2025-03-20T14:36:16Z) - Maps from Motion (MfM): Generating 2D Semantic Maps from Sparse Multi-view Images [17.992488467380923]
OpenStreetMap is the result of 11 million registered users manually annotating the GPS location of over 1.75 billion entries.
At the same time, manual annotations can include errors and are slow to update, limiting the map's accuracy.
Maps from Motion (MfM) is a step forward to automatize such time-consuming map making procedure by computing 2D maps of semantic objects directly from a collection of uncalibrated multi-view images.
arXiv Detail & Related papers (2024-11-19T16:27:31Z) - OpenStreetView-5M: The Many Roads to Global Visual Geolocation [16.468438245804684]
We introduce OpenStreetView-5M, a large-scale, open-access dataset comprising over 5.1 million geo-referenced street view images.
In contrast to existing benchmarks, we enforce a strict train/test separation, allowing us to evaluate the relevance of learned geographical features.
To demonstrate the utility of our dataset, we conduct an extensive benchmark of various state-of-the-art image encoders, spatial representations, and training strategies.
arXiv Detail & Related papers (2024-04-29T17:06:44Z) - SceneGraphLoc: Cross-Modal Coarse Visual Localization on 3D Scene Graphs [81.2396059480232]
SceneGraphLoc learns a fixed-sized embedding for each node (i.e., representing an object instance) in the scene graph.
When images are leveraged, SceneGraphLoc achieves performance close to that of state-of-the-art techniques depending on large image databases.
arXiv Detail & Related papers (2024-03-30T20:25:16Z) - Mapping High-level Semantic Regions in Indoor Environments without
Object Recognition [50.624970503498226]
The present work proposes a method for semantic region mapping via embodied navigation in indoor environments.
To enable region identification, the method uses a vision-to-language model to provide scene information for mapping.
By projecting egocentric scene understanding into the global frame, the proposed method generates a semantic map as a distribution over possible region labels at each location.
arXiv Detail & Related papers (2024-03-11T18:09:50Z) - CurriculumLoc: Enhancing Cross-Domain Geolocalization through
Multi-Stage Refinement [11.108860387261508]
Visual geolocalization is a cost-effective and scalable task that involves matching one or more query images taken at some unknown location, to a set of geo-tagged reference images.
We develop CurriculumLoc, a novel keypoint detection and description with global semantic awareness and a local geometric verification.
We achieve new high recall@1 scores of 62.6% and 94.5% on ALTO, with two different distances metrics, respectively.
arXiv Detail & Related papers (2023-11-20T08:40:01Z) - GeoCLIP: Clip-Inspired Alignment between Locations and Images for
Effective Worldwide Geo-localization [61.10806364001535]
Worldwide Geo-localization aims to pinpoint the precise location of images taken anywhere on Earth.
Existing approaches divide the globe into discrete geographic cells, transforming the problem into a classification task.
We propose GeoCLIP, a novel CLIP-inspired Image-to-GPS retrieval approach that enforces alignment between the image and its corresponding GPS locations.
arXiv Detail & Related papers (2023-09-27T20:54:56Z) - Are Local Features All You Need for Cross-Domain Visual Place
Recognition? [13.519413608607781]
Visual Place Recognition aims to predict the coordinates of an image based solely on visual clues.
Despite recent advances, recognizing the same place when the query comes from a significantly different distribution is still a major hurdle for state of the art retrieval methods.
In this work we explore whether re-ranking methods based on spatial verification can tackle these challenges.
arXiv Detail & Related papers (2023-04-12T14:46:57Z) - Where We Are and What We're Looking At: Query Based Worldwide Image
Geo-localization Using Hierarchies and Scenes [53.53712888703834]
We introduce an end-to-end transformer-based architecture that exploits the relationship between different geographic levels.
We achieve state of the art street level accuracy on 4 standard geo-localization datasets.
arXiv Detail & Related papers (2023-03-07T21:47:58Z) - An Automatic Approach for Generating Rich, Linked Geo-Metadata from
Historical Map Images [6.962949867017594]
This paper presents an end-to-end approach to address the real-world problem of finding and indexing historical map images.
We have implemented the approach in a system called mapKurator.
arXiv Detail & Related papers (2021-12-03T01:44:38Z) - Cross-Descriptor Visual Localization and Mapping [81.16435356103133]
Visual localization and mapping is the key technology underlying the majority of Mixed Reality and robotics systems.
We present three novel scenarios for localization and mapping which require the continuous update of feature representations.
Our data-driven approach is agnostic to the feature descriptor type, has low computational requirements, and scales linearly with the number of description algorithms.
arXiv Detail & Related papers (2020-12-02T18:19:51Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.