Cross-View Image Retrieval -- Ground to Aerial Image Retrieval through
Deep Learning
- URL: http://arxiv.org/abs/2005.00725v1
- Date: Sat, 2 May 2020 06:52:16 GMT
- Title: Cross-View Image Retrieval -- Ground to Aerial Image Retrieval through
Deep Learning
- Authors: Numan Khurshid, Talha Hanif, Mohbat Tharani, Murtaza Taj
- Abstract summary: We present a novel cross-modal retrieval method specifically for multi-view images, called Cross-view Image Retrieval CVIR.
Our approach aims to find a feature space as well as an embedding space in which samples from street-view images are compared directly to satellite-view images.
For this comparison, a novel deep metric learning based solution "DeepCVIR" has been proposed.
- Score: 3.326320568999945
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Cross-modal retrieval aims to measure the content similarity between
different types of data. The idea has been previously applied to visual, text,
and speech data. In this paper, we present a novel cross-modal retrieval method
specifically for multi-view images, called Cross-view Image Retrieval CVIR. Our
approach aims to find a feature space as well as an embedding space in which
samples from street-view images are compared directly to satellite-view images
(and vice-versa). For this comparison, a novel deep metric learning based
solution "DeepCVIR" has been proposed. Previous cross-view image datasets are
deficient in that they (1) lack class information; (2) were originally
collected for cross-view image geolocalization task with coupled images; (3) do
not include any images from off-street locations. To train, compare, and
evaluate the performance of cross-view image retrieval, we present a new 6
class cross-view image dataset termed as CrossViewRet which comprises of images
including freeway, mountain, palace, river, ship, and stadium with 700
high-resolution dual-view images for each class. Results show that the proposed
DeepCVIR outperforms conventional matching approaches on the CVIR task for the
given dataset and would also serve as the baseline for future research.
Related papers
- Rethinking Image Super-Resolution from Training Data Perspectives [54.28824316574355]
We investigate the understudied effect of the training data used for image super-resolution (SR)
With this, we propose an automated image evaluation pipeline.
We find that datasets with (i) low compression artifacts, (ii) high within-image diversity as judged by the number of different objects, and (iii) a large number of images from ImageNet or PASS all positively affect SR performance.
arXiv Detail & Related papers (2024-09-01T16:25:04Z) - Cross-view image geo-localization with Panorama-BEV Co-Retrieval Network [12.692812966686066]
Cross-view geolocalization identifies the geographic location of street view images by matching them with a georeferenced satellite database.
We propose a new approach for cross-view image geo-localization, i.e., the Panorama-BEV Co-Retrieval Network.
arXiv Detail & Related papers (2024-08-10T08:03:58Z) - Zero-Shot Composed Image Retrieval with Textual Inversion [28.513594970580396]
Composed Image Retrieval (CIR) aims to retrieve a target image based on a query composed of a reference image and a relative caption.
We propose a new task, Zero-Shot CIR (ZS-CIR), that aims to address CIR without requiring a labeled training dataset.
arXiv Detail & Related papers (2023-03-27T14:31:25Z) - Learning Co-segmentation by Segment Swapping for Retrieval and Discovery [67.6609943904996]
The goal of this work is to efficiently identify visually similar patterns from a pair of images.
We generate synthetic training pairs by selecting object segments in an image and copy-pasting them into another image.
We show our approach provides clear improvements for artwork details retrieval on the Brueghel dataset.
arXiv Detail & Related papers (2021-10-29T16:51:16Z) - Scaling Up Visual and Vision-Language Representation Learning With Noisy
Text Supervision [57.031588264841]
We leverage a noisy dataset of over one billion image alt-text pairs, obtained without expensive filtering or post-processing steps.
A simple dual-encoder architecture learns to align visual and language representations of the image and text pairs using a contrastive loss.
We show that the scale of our corpus can make up for its noise and leads to state-of-the-art representations even with such a simple learning scheme.
arXiv Detail & Related papers (2021-02-11T10:08:12Z) - Using Text to Teach Image Retrieval [47.72498265721957]
We build on the concept of image manifold to represent the feature space of images, learned via neural networks, as a graph.
We augment the manifold samples with geometrically aligned text, thereby using a plethora of sentences to teach us about images.
The experimental results show that the joint embedding manifold is a robust representation, allowing it to be a better basis to perform image retrieval.
arXiv Detail & Related papers (2020-11-19T16:09:14Z) - Self-Supervised Ranking for Representation Learning [108.38993212650577]
We present a new framework for self-supervised representation learning by formulating it as a ranking problem in an image retrieval context.
We train a representation encoder by maximizing average precision (AP) for ranking, where random views of an image are considered positively related.
In principle, by using a ranking criterion, we eliminate reliance on object-centric curated datasets.
arXiv Detail & Related papers (2020-10-14T17:24:56Z) - AiRound and CV-BrCT: Novel Multi-View Datasets for Scene Classification [2.931113769364182]
We present two new publicly available datasets named thedatasetand CV-BrCT.
The first one contains triplets of images from the same geographic coordinate with different perspectives of view extracted from various places around the world.
The second dataset contains pairs of aerial and street-level images extracted from southeast Brazil.
arXiv Detail & Related papers (2020-08-03T18:55:46Z) - Semantically Tied Paired Cycle Consistency for Any-Shot Sketch-based
Image Retrieval [55.29233996427243]
Low-shot sketch-based image retrieval is an emerging task in computer vision.
In this paper, we address any-shot, i.e. zero-shot and few-shot, sketch-based image retrieval (SBIR) tasks.
For solving these tasks, we propose a semantically aligned cycle-consistent generative adversarial network (SEM-PCYC)
Our results demonstrate a significant boost in any-shot performance over the state-of-the-art on the extended version of the Sketchy, TU-Berlin and QuickDraw datasets.
arXiv Detail & Related papers (2020-06-20T22:43:53Z) - Evaluation of Cross-View Matching to Improve Ground Vehicle Localization
with Aerial Perception [17.349420462716886]
Cross-view matching refers to the problem of finding the closest match for a given query ground view image to one from a database of aerial images.
In this paper, we evaluate cross-view matching for the task of localizing a ground vehicle over a longer trajectory.
arXiv Detail & Related papers (2020-03-13T23:59:07Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.