Learning Sequential Descriptors for Sequence-based Visual Place
Recognition
- URL: http://arxiv.org/abs/2207.03868v1
- Date: Fri, 8 Jul 2022 12:52:04 GMT
- Title: Learning Sequential Descriptors for Sequence-based Visual Place
Recognition
- Authors: Riccardo Mereu, Gabriele Trivigno, Gabriele Berton, Carlo Masone,
Barbara Caputo
- Abstract summary: In robotics, Visual Place Recognition is a continuous process that receives as input a video stream to produce a hypothesis of the robot's current position.
This work proposes a detailed taxonomy of techniques using sequential descriptors, highlighting different mechanism to fuse the information from the individual images.
- Score: 14.738954189759156
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In robotics, Visual Place Recognition is a continuous process that receives
as input a video stream to produce a hypothesis of the robot's current position
within a map of known places. This task requires robust, scalable, and
efficient techniques for real applications. This work proposes a detailed
taxonomy of techniques using sequential descriptors, highlighting different
mechanism to fuse the information from the individual images. This
categorization is supported by a complete benchmark of experimental results
that provides evidence on the strengths and weaknesses of these different
architectural choices. In comparison to existing sequential descriptors
methods, we further investigate the viability of Transformers instead of CNN
backbones, and we propose a new ad-hoc sequence-level aggregator called
SeqVLAD, which outperforms prior state of the art on different datasets. The
code is available at https://github.com/vandal-vpr/vg-transformers.
Related papers
- EventTransAct: A video transformer-based framework for Event-camera
based action recognition [52.537021302246664]
Event cameras offer new opportunities compared to standard action recognition in RGB videos.
In this study, we employ a computationally efficient model, namely the video transformer network (VTN), which initially acquires spatial embeddings per event-frame.
In order to better adopt the VTN for the sparse and fine-grained nature of event data, we design Event-Contrastive Loss ($mathcalL_EC$) and event-specific augmentations.
arXiv Detail & Related papers (2023-08-25T23:51:07Z) - Dynamic Perceiver for Efficient Visual Recognition [87.08210214417309]
We propose Dynamic Perceiver (Dyn-Perceiver) to decouple the feature extraction procedure and the early classification task.
A feature branch serves to extract image features, while a classification branch processes a latent code assigned for classification tasks.
Early exits are placed exclusively within the classification branch, thus eliminating the need for linear separability in low-level features.
arXiv Detail & Related papers (2023-06-20T03:00:22Z) - SegTransVAE: Hybrid CNN -- Transformer with Regularization for medical
image segmentation [0.0]
A novel network named SegTransVAE is proposed in this paper.
SegTransVAE is built upon encoder-decoder architecture, exploiting transformer with the variational autoencoder (VAE) branch to the network.
Evaluation on various recently introduced datasets shows that SegTransVAE outperforms previous methods in Dice Score and $95%$-Haudorff Distance.
arXiv Detail & Related papers (2022-01-21T08:02:55Z) - Efficient Video Transformers with Spatial-Temporal Token Selection [68.27784654734396]
We present STTS, a token selection framework that dynamically selects a few informative tokens in both temporal and spatial dimensions conditioned on input video samples.
Our framework achieves similar results while requiring 20% less computation.
arXiv Detail & Related papers (2021-11-23T00:35:58Z) - MD-CSDNetwork: Multi-Domain Cross Stitched Network for Deepfake
Detection [80.83725644958633]
Current deepfake generation methods leave discriminative artifacts in the frequency spectrum of fake images and videos.
We present a novel approach, termed as MD-CSDNetwork, for combining the features in the spatial and frequency domains to mine a shared discriminative representation.
arXiv Detail & Related papers (2021-09-15T14:11:53Z) - HAT: Hierarchical Aggregation Transformers for Person Re-identification [87.02828084991062]
We take advantages of both CNNs and Transformers for image-based person Re-ID with high performance.
Work is the first to take advantages of both CNNs and Transformers for image-based person Re-ID.
arXiv Detail & Related papers (2021-07-13T09:34:54Z) - Self-Supervised Learning via multi-Transformation Classification for
Action Recognition [10.676377556393527]
We introduce a self-supervised video representation learning method based on the multi-transformation classification to efficiently classify human actions.
The representation of the video is learned in a self-supervised manner by classifying seven different transformations.
We have conducted the experiments on UCF101 and HMDB51 datasets together with C3D and 3D Resnet-18 as backbone networks.
arXiv Detail & Related papers (2021-02-20T16:11:26Z) - A cellular automata approach to local patterns for texture recognition [3.42658286826597]
We propose a method for texture descriptors that combines the representation power of complex objects by cellular automata with the known effectiveness of local descriptors in texture analysis.
Our proposal outperforms other classical and state-of-the-art approaches, especially in the real-world problem.
arXiv Detail & Related papers (2020-07-15T03:25:51Z) - Unsupervised Learning of Video Representations via Dense Trajectory
Clustering [86.45054867170795]
This paper addresses the task of unsupervised learning of representations for action recognition in videos.
We first propose to adapt two top performing objectives in this class - instance recognition and local aggregation.
We observe promising performance, but qualitative analysis shows that the learned representations fail to capture motion patterns.
arXiv Detail & Related papers (2020-06-28T22:23:03Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.