DeepSeqSLAM: A Trainable CNN+RNN for Joint Global Description and
Sequence-based Place Recognition
- URL: http://arxiv.org/abs/2011.08518v1
- Date: Tue, 17 Nov 2020 09:14:02 GMT
- Title: DeepSeqSLAM: A Trainable CNN+RNN for Joint Global Description and
Sequence-based Place Recognition
- Authors: Marvin Chanc\'an, Michael Milford
- Abstract summary: We propose DeepSeqSLAM: a trainable CNN+NN architecture for jointly learning visual and positional representations from a single image sequence of a route.
We demonstrate our approach on two large benchmark datasets, Nordland and Oxford RobotCar.
Our approach can get over 72% AUC compared to 27% AUC for Delta Descriptors and 2% AUC for SeqSLAM; while drastically reducing the deployment time from around 1 hour to 1 minute against both.
- Score: 23.54696982881734
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Sequence-based place recognition methods for all-weather navigation are
well-known for producing state-of-the-art results under challenging day-night
or summer-winter transitions. These systems, however, rely on complex
handcrafted heuristics for sequential matching - which are applied on top of a
pre-computed pairwise similarity matrix between reference and query image
sequences of a single route - to further reduce false-positive rates compared
to single-frame retrieval methods. As a result, performing multi-frame place
recognition can be extremely slow for deployment on autonomous vehicles or
evaluation on large datasets, and fail when using relatively short parameter
values such as a sequence length of 2 frames. In this paper, we propose
DeepSeqSLAM: a trainable CNN+RNN architecture for jointly learning visual and
positional representations from a single monocular image sequence of a route.
We demonstrate our approach on two large benchmark datasets, Nordland and
Oxford RobotCar - recorded over 728 km and 10 km routes, respectively, each
during 1 year with multiple seasons, weather, and lighting conditions. On
Nordland, we compare our method to two state-of-the-art sequence-based methods
across the entire route under summer-winter changes using a sequence length of
2 and show that our approach can get over 72% AUC compared to 27% AUC for Delta
Descriptors and 2% AUC for SeqSLAM; while drastically reducing the deployment
time from around 1 hour to 1 minute against both. The framework code and video
are available at https://mchancan.github.io/deepseqslam
Related papers
- RTMO: Towards High-Performance One-Stage Real-Time Multi-Person Pose Estimation [46.659592045271125]
RTMO is a one-stage pose estimation framework that seamlessly integrates coordinate classification.
It achieves accuracy comparable to top-down methods while maintaining high speed.
Our largest model, RTMO-l, attains 74.8% AP on COCO val 2017 and 141 FPS on a single V100 GPU.
arXiv Detail & Related papers (2023-12-12T18:55:29Z) - Unified Coarse-to-Fine Alignment for Video-Text Retrieval [71.85966033484597]
We propose a Unified Coarse-to-fine Alignment model, dubbed UCoFiA.
Our model captures the cross-modal similarity information at different granularity levels.
We apply the Sinkhorn-Knopp algorithm to normalize the similarities of each level before summing them.
arXiv Detail & Related papers (2023-09-18T19:04:37Z) - TAPIR: Tracking Any Point with per-frame Initialization and temporal
Refinement [64.11385310305612]
We present a novel model for Tracking Any Point (TAP) that effectively tracks any queried point on any physical surface throughout a video sequence.
Our approach employs two stages: (1) a matching stage, which independently locates a suitable candidate point match for the query point on every other frame, and (2) a refinement stage, which updates both the trajectory and query features based on local correlations.
The resulting model surpasses all baseline methods by a significant margin on the TAP-Vid benchmark, as demonstrated by an approximate 20% absolute average Jaccard (AJ) improvement on DAVIS.
arXiv Detail & Related papers (2023-06-14T17:07:51Z) - SFNet: Faster and Accurate Semantic Segmentation via Semantic Flow [88.97790684009979]
A common practice to improve the performance is to attain high-resolution feature maps with strong semantic representation.
We propose a Flow Alignment Module (FAM) to learn textitSemantic Flow between feature maps of adjacent levels.
We also present a novel Gated Dual Flow Alignment Module to directly align high-resolution feature maps and low-resolution feature maps.
arXiv Detail & Related papers (2022-07-10T08:25:47Z) - EdgeNeXt: Efficiently Amalgamated CNN-Transformer Architecture for
Mobile Vision Applications [68.35683849098105]
We introduce split depth-wise transpose attention (SDTA) encoder that splits input tensors into multiple channel groups.
Our EdgeNeXt model with 1.3M parameters achieves 71.2% top-1 accuracy on ImageNet-1K.
Our EdgeNeXt model with 5.6M parameters achieves 79.4% top-1 accuracy on ImageNet-1K.
arXiv Detail & Related papers (2022-06-21T17:59:56Z) - DTWSSE: Data Augmentation with a Siamese Encoder for Time Series [8.019203034348083]
We propose a DTW-based synthetic minority oversampling technique using siamese encoder for named DTWSSE.
In order to reasonably measure the distance of the time series, DTW, which has been verified to be an effective method, is employed as the distance metric.
The encoder is a Neural Network for mapping the time series data from the DTW hidden space to the Euclidean deep feature space, and the decoder is used to map the deep feature space back to the DTW hidden space.
arXiv Detail & Related papers (2021-08-23T01:46:24Z) - Sequential Place Learning: Heuristic-Free High-Performance Long-Term
Place Recognition [24.70946979449572]
We develop a learning-based CNN+LSTM architecture, trainable via backpropagation through time, for viewpoint- and appearance-invariant place recognition.
Our model outperforms 15 classical methods while setting new state-of-the-art performance standards.
In addition, we show that SPL can be up to 70x faster to deploy than classical methods on a 729 km route.
arXiv Detail & Related papers (2021-03-02T22:57:43Z) - Understanding Image Retrieval Re-Ranking: A Graph Neural Network
Perspective [52.96911968968888]
In this paper, we demonstrate that re-ranking can be reformulated as a high-parallelism Graph Neural Network (GNN) function.
On the Market-1501 dataset, we accelerate the re-ranking processing from 89.2s to 9.4ms with one K40m GPU, facilitating the real-time post-processing.
arXiv Detail & Related papers (2020-12-14T15:12:36Z) - Approximated Bilinear Modules for Temporal Modeling [116.6506871576514]
Two-layers in CNNs can be converted to temporal bilinear modules by adding an auxiliary-branch sampling.
Our models can outperform most state-of-the-art methods on SomethingSomething v1 and v2 datasets without pretraining.
arXiv Detail & Related papers (2020-07-25T09:07:35Z) - SUPER: A Novel Lane Detection System [26.417172945374364]
We propose a real-time lane detection system, called Scene Understanding Physics-Enhanced Real-time (SUPER) algorithm.
We train the proposed system using heterogeneous data from Cityscapes, Vistas and Apollo, and evaluate the performance on four completely separate datasets.
Preliminary test results show promising real-time lane-detection performance compared with the Mobileye.
arXiv Detail & Related papers (2020-05-14T21:40:39Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.