NPR: Nocturnal Place Recognition in Streets
- URL: http://arxiv.org/abs/2304.00276v2
- Date: Mon, 17 Apr 2023 16:28:47 GMT
- Title: NPR: Nocturnal Place Recognition in Streets
- Authors: Bingxi Liu, Yujie Fu, Feng Lu, Jinqiang Cui, Yihong Wu, Hong Zhang
- Abstract summary: We propose a novel pipeline that divides Visual Place Recognition (VPR) and conquers Nocturnal Place Recognition (NPR)
Specifically, we first established a street-level day-night dataset, NightStreet, and used it to train an unpaired image-to-image translation model.
Then we used this model to process existing large-scale VPR datasets to generate the VPR-Night datasets and demonstrated how to combine them with two popular VPR pipelines.
- Score: 15.778129994700496
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Visual Place Recognition (VPR) is the task of retrieving database images
similar to a query photo by comparing it to a large database of known images.
In real-world applications, extreme illumination changes caused by query images
taken at night pose a significant obstacle that VPR needs to overcome. However,
a training set with day-night correspondence for city-scale, street-level VPR
does not exist. To address this challenge, we propose a novel pipeline that
divides VPR and conquers Nocturnal Place Recognition (NPR). Specifically, we
first established a street-level day-night dataset, NightStreet, and used it to
train an unpaired image-to-image translation model. Then we used this model to
process existing large-scale VPR datasets to generate the VPR-Night datasets
and demonstrated how to combine them with two popular VPR pipelines. Finally,
we proposed a divide-and-conquer VPR framework and provided explanations at the
theoretical, experimental, and application levels. Under our framework,
previous methods can significantly improve performance on two public datasets,
including the top-ranked method.
Related papers
- bit2bit: 1-bit quanta video reconstruction via self-supervised photon prediction [57.199618102578576]
We propose bit2bit, a new method for reconstructing high-quality image stacks at original resolution from sparse binary quantatemporal image data.
Inspired by recent work on Poisson denoising, we developed an algorithm that creates a dense image sequence from sparse binary photon data.
We present a novel dataset containing a wide range of real SPAD high-speed videos under various challenging imaging conditions.
arXiv Detail & Related papers (2024-10-30T17:30:35Z) - PIG: Prompt Images Guidance for Night-Time Scene Parsing [48.35991796324741]
Unsupervised domain adaptation (UDA) has become the predominant method for studying night scenes.
We propose a Night-Focused Network (NFNet) to learn night-specific features from both target domain images and prompt images.
We conduct experiments on four night-time datasets: NightCity, NightCity+, Dark Zurich, and ACDC.
arXiv Detail & Related papers (2024-06-15T07:06:19Z) - EffoVPR: Effective Foundation Model Utilization for Visual Place Recognition [6.996304653818122]
We present an effective approach to harness the potential of a foundation model for Visual Place Recognition.
We show that features extracted from self-attention layers can act as a powerful re-ranker for VPR, even in a zero-shot setting.
Our method also demonstrates exceptional robustness and generalization, setting new state-of-the-art performance.
arXiv Detail & Related papers (2024-05-28T11:24:41Z) - Collaborative Visual Place Recognition through Federated Learning [5.06570397863116]
Visual Place Recognition (VPR) aims to estimate the location of an image by treating it as a retrieval problem.
VPR uses a database of geo-tagged images and leverages deep neural networks to extract a global representation, called descriptor, from each image.
This research revisits the task of VPR through the lens of Federated Learning (FL), addressing several key challenges associated with this adaptation.
arXiv Detail & Related papers (2024-04-20T08:48:37Z) - NYC-Indoor-VPR: A Long-Term Indoor Visual Place Recognition Dataset with Semi-Automatic Annotation [7.037667953803237]
This paper introduces the NYC-Indoor-VPR dataset, a unique and rich collection of over 36,000 images compiled from 13 distinct crowded scenes in New York City.
To establish the ground truth for VPR, we propose a semiautomatic annotation approach that computes the positional information of each image.
Our method specifically takes pairs of videos as input and yields matched pairs of images along with their estimated relative locations.
arXiv Detail & Related papers (2024-03-31T00:20:53Z) - CricaVPR: Cross-image Correlation-aware Representation Learning for Visual Place Recognition [73.51329037954866]
We propose a robust global representation method with cross-image correlation awareness for visual place recognition.
Our method uses the attention mechanism to correlate multiple images within a batch.
Our method outperforms state-of-the-art methods by a large margin with significantly less training time.
arXiv Detail & Related papers (2024-02-29T15:05:11Z) - NocPlace: Nocturnal Visual Place Recognition via Generative and Inherited Knowledge Transfer [11.203135595002978]
NocPlace embeds resilience against dazzling lights and extreme darkness in the global descriptor.
NocPlace improves the performance of Eigenplaces by 7.6% on Tokyo 24/7 Night and 16.8% on SVOX Night.
arXiv Detail & Related papers (2024-02-27T02:47:09Z) - Combining Attention Module and Pixel Shuffle for License Plate
Super-Resolution [3.8831062015253055]
This work focuses on license plate (LP) reconstruction in low-resolution and low-quality images.
We present a Single-Image Super-Resolution (SISR) approach that extends the attention/transformer module concept.
In our experiments, the proposed method outperformed the baselines both quantitatively and qualitatively.
arXiv Detail & Related papers (2022-10-30T13:05:07Z) - Coarse-to-Fine Vision-Language Pre-training with Fusion in the Backbone [170.85076677740292]
We present FIBER (Fusion-In-the-Backbone-basedER), a new model architecture for vision-language (VL) pre-training.
Instead of having dedicated transformer layers for fusion after the uni-modal backbones, FIBER pushes multimodal fusion deep into the model.
We conduct comprehensive experiments on a wide range of VL tasks, ranging from VQA, image captioning, and retrieval, to phrase grounding, referring expression comprehension, and object detection.
arXiv Detail & Related papers (2022-06-15T16:41:29Z) - Night-time Scene Parsing with a Large Real Dataset [67.11211537439152]
We aim to address the night-time scene parsing (NTSP) problem, which has two main challenges.
To tackle the scarcity of night-time data, we collect a novel labeled dataset, named it NightCity, of 4,297 real night-time images.
We also propose an exposure-aware framework to address the NTSP problem through augmenting the segmentation process with explicitly learned exposure features.
arXiv Detail & Related papers (2020-03-15T18:11:34Z) - BP-DIP: A Backprojection based Deep Image Prior [49.375539602228415]
We propose two image restoration approaches: (i) Deep Image Prior (DIP), which trains a convolutional neural network (CNN) from scratch in test time using the degraded image; and (ii) a backprojection (BP) fidelity term, which is an alternative to the standard least squares loss that is usually used in previous DIP works.
We demonstrate the performance of the proposed method, termed BP-DIP, on the deblurring task and show its advantages over the plain DIP, with both higher PSNR values and better inference run-time.
arXiv Detail & Related papers (2020-03-11T17:09:12Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.