NYU-VPR: Long-Term Visual Place Recognition Benchmark with View
Direction and Data Anonymization Influences
- URL: http://arxiv.org/abs/2110.09004v1
- Date: Mon, 18 Oct 2021 03:56:33 GMT
- Title: NYU-VPR: Long-Term Visual Place Recognition Benchmark with View
Direction and Data Anonymization Influences
- Authors: Diwei Sheng, Yuxiang Chai, Xinru Li, Chen Feng, Jianzhe Lin, Claudio
Silva, John-Ross Rizzo
- Abstract summary: We present a dataset of more than 200,000 images over a 2km by 2km area near the New York University campus, taken within the whole year of 2016.
We show that side views are significantly more challenging for current VPR methods while the influence of data anonymization is almost negligible.
- Score: 5.94860356161563
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Visual place recognition (VPR) is critical in not only localization and
mapping for autonomous driving vehicles, but also assistive navigation for the
visually impaired population. To enable a long-term VPR system on a large
scale, several challenges need to be addressed. First, different applications
could require different image view directions, such as front views for
self-driving cars while side views for the low vision people. Second, VPR in
metropolitan scenes can often cause privacy concerns due to the imaging of
pedestrian and vehicle identity information, calling for the need for data
anonymization before VPR queries and database construction. Both factors could
lead to VPR performance variations that are not well understood yet. To study
their influences, we present the NYU-VPR dataset that contains more than
200,000 images over a 2km by 2km area near the New York University campus,
taken within the whole year of 2016. We present benchmark results on several
popular VPR algorithms showing that side views are significantly more
challenging for current VPR methods while the influence of data anonymization
is almost negligible, together with our hypothetical explanations and in-depth
analysis.
Related papers
- NYC-Indoor-VPR: A Long-Term Indoor Visual Place Recognition Dataset with Semi-Automatic Annotation [7.037667953803237]
This paper introduces the NYC-Indoor-VPR dataset, a unique and rich collection of over 36,000 images compiled from 13 distinct crowded scenes in New York City.
To establish the ground truth for VPR, we propose a semiautomatic annotation approach that computes the positional information of each image.
Our method specifically takes pairs of videos as input and yields matched pairs of images along with their estimated relative locations.
arXiv Detail & Related papers (2024-03-31T00:20:53Z) - DriveLM: Driving with Graph Visual Question Answering [57.51930417790141]
We study how vision-language models (VLMs) trained on web-scale data can be integrated into end-to-end driving systems.
We propose a VLM-based baseline approach (DriveLM-Agent) for jointly performing Graph VQA and end-to-end driving.
arXiv Detail & Related papers (2023-12-21T18:59:12Z) - CROVIA: Seeing Drone Scenes from Car Perspective via Cross-View
Adaptation [20.476683921252867]
We propose a novel Cross-View Adaptation (CROVIA) approach to adapt the knowledge learned from on-road vehicle views to UAV views.
First, a novel geometry-based constraint to cross-view adaptation is introduced based on the geometry correlation between views.
Second, cross-view correlations from image space are effectively transferred to segmentation space without any requirement of paired on-road and UAV view data.
arXiv Detail & Related papers (2023-04-14T15:20:40Z) - NPR: Nocturnal Place Recognition in Streets [15.778129994700496]
We propose a novel pipeline that divides Visual Place Recognition (VPR) and conquers Nocturnal Place Recognition (NPR)
Specifically, we first established a street-level day-night dataset, NightStreet, and used it to train an unpaired image-to-image translation model.
Then we used this model to process existing large-scale VPR datasets to generate the VPR-Night datasets and demonstrated how to combine them with two popular VPR pipelines.
arXiv Detail & Related papers (2023-04-01T09:43:58Z) - Visual Place Recognition: A Tutorial [40.576083932383895]
This paper is the first tutorial paper on visual place recognition.
It covers topics such as the formulation of the VPR problem, a general-purpose algorithmic pipeline, and an evaluation methodology for VPR approaches.
Practical code examples in Python illustrate to prospective practitioners and researchers how VPR is implemented and evaluated.
arXiv Detail & Related papers (2023-03-06T16:52:11Z) - Learning Self-Regularized Adversarial Views for Self-Supervised Vision
Transformers [105.89564687747134]
We propose a self-regularized AutoAugment method to learn views for self-supervised vision transformers.
First, we reduce the search cost of AutoView to nearly zero by learning views and network parameters simultaneously.
We also present a curated augmentation policy search space for self-supervised learning.
arXiv Detail & Related papers (2022-10-16T06:20:44Z) - Video Graph Transformer for Video Question Answering [182.14696075946742]
This paper proposes a Video Graph Transformer (VGT) model for Video Quetion Answering (VideoQA)
We show that VGT can achieve much better performances on VideoQA tasks that challenge dynamic relation reasoning than prior arts in the pre-training-free scenario.
arXiv Detail & Related papers (2022-07-12T06:51:32Z) - Fine-Grained Vehicle Perception via 3D Part-Guided Visual Data
Augmentation [77.60050239225086]
We propose an effective training data generation process by fitting a 3D car model with dynamic parts to vehicles in real images.
Our approach is fully automatic without any human interaction.
We present a multi-task network for VUS parsing and a multi-stream network for VHI parsing.
arXiv Detail & Related papers (2020-12-15T03:03:38Z) - V2VNet: Vehicle-to-Vehicle Communication for Joint Perception and
Prediction [74.42961817119283]
We use vehicle-to-vehicle (V2V) communication to improve the perception and motion forecasting performance of self-driving vehicles.
By intelligently aggregating the information received from multiple nearby vehicles, we can observe the same scene from different viewpoints.
arXiv Detail & Related papers (2020-08-17T17:58:26Z) - VPR-Bench: An Open-Source Visual Place Recognition Evaluation Framework
with Quantifiable Viewpoint and Appearance Change [25.853640977526705]
VPR research has grown rapidly as a field over the past decade due to improving camera hardware and its potential for deep learning-based techniques.
This growth has led to fragmentation and a lack of standardisation in the field, especially concerning performance evaluation.
In this paper, we address these gaps through a new comprehensive open-source framework for assessing the performance of VPR techniques, dubbed "VPR-Bench"
arXiv Detail & Related papers (2020-05-17T00:27:53Z) - Parsing-based View-aware Embedding Network for Vehicle Re-Identification [138.11983486734576]
We propose a parsing-based view-aware embedding network (PVEN) to achieve the view-aware feature alignment and enhancement for vehicle ReID.
The experiments conducted on three datasets show that our model outperforms state-of-the-art methods by a large margin.
arXiv Detail & Related papers (2020-04-10T13:06:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.