Learning Sequence Descriptor based on Spatio-Temporal Attention for
Visual Place Recognition
- URL: http://arxiv.org/abs/2305.11467v4
- Date: Sat, 27 Jan 2024 05:21:33 GMT
- Title: Learning Sequence Descriptor based on Spatio-Temporal Attention for
Visual Place Recognition
- Authors: Junqiao Zhao, Fenglin Zhang, Yingfeng Cai, Gengxuan Tian, Wenjie Mu,
Chen Ye, Tiantian Feng
- Abstract summary: Visual Place Recognition (VPR) aims to retrieve frames from atagged database that are located at the same place as the query frame.
To improve the robustness of VPR in geoly aliasing scenarios, sequence-based VPR methods are proposed.
We use a sliding window to control the temporal range of attention and use relative positional encoding to construct sequential relationships between different features.
- Score: 16.380948630155476
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Visual Place Recognition (VPR) aims to retrieve frames from a geotagged
database that are located at the same place as the query frame. To improve the
robustness of VPR in perceptually aliasing scenarios, sequence-based VPR
methods are proposed. These methods are either based on matching between frame
sequences or extracting sequence descriptors for direct retrieval. However, the
former is usually based on the assumption of constant velocity, which is
difficult to hold in practice, and is computationally expensive and subject to
sequence length. Although the latter overcomes these problems, existing
sequence descriptors are constructed by aggregating features of multiple frames
only, without interaction on temporal information, and thus cannot obtain
descriptors with spatio-temporal discrimination.In this paper, we propose a
sequence descriptor that effectively incorporates spatio-temporal information.
Specifically, spatial attention within the same frame is utilized to learn
spatial feature patterns, while attention in corresponding local regions of
different frames is utilized to learn the persistence or change of features
over time. We use a sliding window to control the temporal range of attention
and use relative positional encoding to construct sequential relationships
between different features. This allows our descriptors to capture the
intrinsic dynamics in a sequence of frames.Comprehensive experiments on
challenging benchmark datasets show that the proposed approach outperforms
recent state-of-the-art methods.The code is available at
https://github.com/tiev-tongji/Spatio-Temporal-SeqVPR.
Related papers
- Introducing Gating and Context into Temporal Action Detection [0.8987776881291144]
Temporal Action Detection (TAD) remains challenging due to action overlaps and variable action durations.
Recent findings suggest that TAD performance is dependent on the structural design of transformers rather than on the self-attention mechanism.
We propose a refined feature extraction process through lightweight, yet effective operations.
arXiv Detail & Related papers (2024-09-06T11:52:42Z) - Temporally Grounding Instructional Diagrams in Unconstrained Videos [51.85805768507356]
We study the challenging problem of simultaneously localizing a sequence of queries in instructional diagrams in a video.
Most existing methods focus on grounding one query at a time, ignoring the inherent structures among queries.
We propose composite queries constructed by exhaustively pairing up the visual content features of the step diagrams.
We demonstrate the effectiveness of our approach on the IAW dataset for grounding step diagrams and the YouCook2 benchmark for grounding natural language queries.
arXiv Detail & Related papers (2024-07-16T05:44:30Z) - Temporally Consistent Referring Video Object Segmentation with Hybrid Memory [98.80249255577304]
We propose an end-to-end R-VOS paradigm that explicitly models temporal consistency alongside the referring segmentation.
Features of frames with automatically generated high-quality reference masks are propagated to segment remaining frames.
Extensive experiments demonstrate that our approach enhances temporal consistency by a significant margin.
arXiv Detail & Related papers (2024-03-28T13:32:49Z) - FOCAL: Contrastive Learning for Multimodal Time-Series Sensing Signals
in Factorized Orthogonal Latent Space [7.324708513042455]
This paper proposes a novel contrastive learning framework, called FOCAL, for extracting comprehensive features from multimodal time-series sensing signals.
It consistently outperforms the state-of-the-art baselines in downstream tasks with a clear margin.
arXiv Detail & Related papers (2023-10-30T22:55:29Z) - An Efficient Temporary Deepfake Location Approach Based Embeddings for
Partially Spoofed Audio Detection [4.055489363682199]
We propose a fine-grained partially spoofed audio detection method, namely Temporal Deepfake Location (TDL)
Our approach involves two novel parts: embedding similarity module and temporal convolution operation.
Our method outperform baseline models in ASVspoof 2019 Partial Spoof dataset and demonstrate superior performance even in the crossdataset scenario.
arXiv Detail & Related papers (2023-09-06T14:29:29Z) - Temporally-Consistent Surface Reconstruction using Metrically-Consistent
Atlases [131.50372468579067]
We propose a method for unsupervised reconstruction of a temporally-consistent sequence of surfaces from a sequence of time-evolving point clouds.
We represent the reconstructed surfaces as atlases computed by a neural network, which enables us to establish correspondences between frames.
Our approach outperforms state-of-the-art ones on several challenging datasets.
arXiv Detail & Related papers (2021-11-12T17:48:25Z) - Improving Video Instance Segmentation via Temporal Pyramid Routing [61.10753640148878]
Video Instance (VIS) is a new and inherently multi-task problem, which aims to detect, segment and track each instance in a video sequence.
We propose a Temporal Pyramid Routing (TPR) strategy to conditionally align and conduct pixel-level aggregation from a feature pyramid pair of two adjacent frames.
Our approach is a plug-and-play module and can be easily applied to existing instance segmentation methods.
arXiv Detail & Related papers (2021-07-28T03:57:12Z) - Unsupervised Representation Learning for Time Series with Temporal
Neighborhood Coding [8.45908939323268]
We propose a self-supervised framework for learning generalizable representations for non-stationary time series.
Our motivation stems from the medical field, where the ability to model the dynamic nature of time series data is especially valuable.
arXiv Detail & Related papers (2021-06-01T19:53:24Z) - SeqNet: Learning Descriptors for Sequence-based Hierarchical Place
Recognition [31.714928102950594]
We present a novel hybrid system that creates a high performance initial match hypothesis generator.
Sequence descriptors are generated using a temporal convolutional network dubbed SeqNet.
We then perform selective sequential score aggregation using shortlisted single image learnt descriptors to produce an overall place match hypothesis.
arXiv Detail & Related papers (2021-02-23T10:32:10Z) - Co-Saliency Spatio-Temporal Interaction Network for Person
Re-Identification in Videos [85.6430597108455]
We propose a novel Co-Saliency Spatio-Temporal Interaction Network (CSTNet) for person re-identification in videos.
It captures the common salient foreground regions among video frames and explores the spatial-temporal long-range context interdependency from such regions.
Multiple spatialtemporal interaction modules within CSTNet are proposed, which exploit the spatial and temporal long-range context interdependencies on such features and spatial-temporal information correlation.
arXiv Detail & Related papers (2020-04-10T10:23:58Z) - Efficient Semantic Video Segmentation with Per-frame Inference [117.97423110566963]
In this work, we process efficient semantic video segmentation in a per-frame fashion during the inference process.
We employ compact models for real-time execution. To narrow the performance gap between compact models and large models, new knowledge distillation methods are designed.
arXiv Detail & Related papers (2020-02-26T12:24:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.