Multi-Spectral Remote Sensing Image Retrieval Using Geospatial Foundation Models
- URL: http://arxiv.org/abs/2403.02059v2
- Date: Wed, 22 May 2024 08:42:45 GMT
- Title: Multi-Spectral Remote Sensing Image Retrieval Using Geospatial Foundation Models
- Authors: Benedikt Blumenstiel, Viktoria Moor, Romeo Kienzler, Thomas Brunschwiler,
- Abstract summary: This work proposes to use Geospatial Foundation Models, like Prithvi, for remote sensing image retrieval.
We introduce two datasets to the retrieval task and observe a strong performance.
Prithvi processes six bands and achieves a mean Average Precision of 97.62% on BigEarthNet-43 and 44.51% on ForestNet-12.
- Score: 0.562479170374811
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Image retrieval enables an efficient search through vast amounts of satellite imagery and returns similar images to a query. Deep learning models can identify images across various semantic concepts without the need for annotations. This work proposes to use Geospatial Foundation Models, like Prithvi, for remote sensing image retrieval with multiple benefits: i) the models encode multi-spectral satellite data and ii) generalize without further fine-tuning. We introduce two datasets to the retrieval task and observe a strong performance: Prithvi processes six bands and achieves a mean Average Precision of 97.62% on BigEarthNet-43 and 44.51% on ForestNet-12, outperforming other RGB-based models. Further, we evaluate three compression methods with binarized embeddings balancing retrieval speed and accuracy. They match the retrieval speed of much shorter hash codes while maintaining the same accuracy as floating-point embeddings but with a 32-fold compression. The code is available at https://github.com/IBM/remote-sensing-image-retrieval.
Related papers
- Splatter Image: Ultra-Fast Single-View 3D Reconstruction [67.96212093828179]
Splatter Image is based on Gaussian Splatting, which allows fast and high-quality reconstruction of 3D scenes from multiple images.
We learn a neural network that, at test time, performs reconstruction in a feed-forward manner, at 38 FPS.
On several synthetic, real, multi-category and large-scale benchmark datasets, we achieve better results in terms of PSNR, LPIPS, and other metrics while training and evaluating much faster than prior works.
arXiv Detail & Related papers (2023-12-20T16:14:58Z) - Deep supervised hashing for fast retrieval of radio image cubes [5.688539343057255]
Deep hashing algorithms have been shown to be efficient at image retrieval tasks in the fields of computer vision and multimedia.
In this work, we utilize deep hashing to rapidly search for similar images in a large database.
The experimental results demonstrate the capability to search and retrieve similar radio images efficiently and at scale.
arXiv Detail & Related papers (2023-09-02T12:59:52Z) - A Triplet-loss Dilated Residual Network for High-Resolution
Representation Learning in Image Retrieval [0.0]
In some applications, such as localization, image retrieval is employed as the initial step.
The current paper introduces a simple yet efficient image retrieval system with a fewer trainable parameters.
The proposed method benefits from a dilated residual convolutional neural network with triplet loss.
arXiv Detail & Related papers (2023-03-15T07:01:44Z) - Learning to Detect Good Keypoints to Match Non-Rigid Objects in RGB
Images [7.428474910083337]
We present a novel learned keypoint detection method designed to maximize the number of correct matches for the task of non-rigid image correspondence.
Our training framework uses true correspondences, obtained by matching annotated image pairs with a predefined descriptor extractor, as a ground-truth to train a convolutional neural network (CNN)
Experiments show that our method outperforms the state-of-the-art keypoint detector on real images of non-rigid objects by 20 p.p. on Mean Matching Accuracy.
arXiv Detail & Related papers (2022-12-13T11:59:09Z) - NeuMap: Neural Coordinate Mapping by Auto-Transdecoder for Camera
Localization [60.73541222862195]
NeuMap is an end-to-end neural mapping method for camera localization.
It encodes a whole scene into a grid of latent codes, with which a Transformer-based auto-decoder regresses 3D coordinates of query pixels.
arXiv Detail & Related papers (2022-11-21T04:46:22Z) - Scrape, Cut, Paste and Learn: Automated Dataset Generation Applied to
Parcel Logistics [58.720142291102135]
We present a fully automated pipeline to generate a synthetic dataset for instance segmentation in four steps.
We first scrape images for the objects of interest from popular image search engines.
We compare three different methods for image selection: Object-agnostic pre-processing, manual image selection and CNN-based image selection.
arXiv Detail & Related papers (2022-10-18T12:49:04Z) - Pattern Spotting and Image Retrieval in Historical Documents using Deep
Hashing [60.67014034968582]
This paper presents a deep learning approach for image retrieval and pattern spotting in digital collections of historical documents.
Deep learning models are used for feature extraction, considering two distinct variants, which provide either real-valued or binary code representations.
The proposed approach also reduces the search time by up to 200x and the storage cost up to 6,000x when compared to related works.
arXiv Detail & Related papers (2022-08-04T01:39:37Z) - Beyond Cross-view Image Retrieval: Highly Accurate Vehicle Localization
Using Satellite Image [91.29546868637911]
This paper addresses the problem of vehicle-mounted camera localization by matching a ground-level image with an overhead-view satellite map.
The key idea is to formulate the task as pose estimation and solve it by neural-net based optimization.
Experiments on standard autonomous vehicle localization datasets have confirmed the superiority of the proposed method.
arXiv Detail & Related papers (2022-04-10T19:16:58Z) - Asymmetric Hash Code Learning for Remote Sensing Image Retrieval [22.91678927865952]
We propose a novel deep hashing method, named asymmetric hash code learning (AHCL), for remote sensing image retrieval.
The AHCL generates the hash codes of query and database images in an asymmetric way.
The experimental results on three public datasets demonstrate that the proposed method outperforms symmetric methods in terms of retrieval accuracy and efficiency.
arXiv Detail & Related papers (2022-01-15T07:00:38Z) - DenserNet: Weakly Supervised Visual Localization Using Multi-scale
Feature Aggregation [7.2531609092488445]
We develop a convolutional neural network architecture which aggregates feature maps at different semantic levels for image representations.
Second, our model is trained end-to-end without pixel-level annotation other than positive and negative GPS-tagged image pairs.
Third, our method is computationally efficient as our architecture has shared features and parameters during computation.
arXiv Detail & Related papers (2020-12-04T02:16:47Z) - Swapping Autoencoder for Deep Image Manipulation [94.33114146172606]
We propose the Swapping Autoencoder, a deep model designed specifically for image manipulation.
The key idea is to encode an image with two independent components and enforce that any swapped combination maps to a realistic image.
Experiments on multiple datasets show that our model produces better results and is substantially more efficient compared to recent generative models.
arXiv Detail & Related papers (2020-07-01T17:59:57Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.