Spike-EVPR: Deep Spiking Residual Network with Cross-Representation
Aggregation for Event-Based Visual Place Recognition
- URL: http://arxiv.org/abs/2402.10476v1
- Date: Fri, 16 Feb 2024 06:45:25 GMT
- Title: Spike-EVPR: Deep Spiking Residual Network with Cross-Representation
Aggregation for Event-Based Visual Place Recognition
- Authors: Chenming Hu, Zheng Fang, Kuanxu Hou, Delei Kong, Junjie Jiang, Hao
Zhuang, Mingyuan Sun and Xinjie Huang
- Abstract summary: Event cameras have been successfully applied to visual place recognition (VPR) tasks by using deep artificial neural networks (ANNs)
We propose a novel deep spiking network architecture called Spike-EVPR for event-based VPR tasks.
To address the aforementioned issues, we propose a novel deep spiking network architecture called Spike-EVPR for event-based VPR tasks.
- Score: 4.357768397230497
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Event cameras have been successfully applied to visual place recognition
(VPR) tasks by using deep artificial neural networks (ANNs) in recent years.
However, previously proposed deep ANN architectures are often unable to harness
the abundant temporal information presented in event streams. In contrast, deep
spiking networks exhibit more intricate spatiotemporal dynamics and are
inherently well-suited to process sparse asynchronous event streams.
Unfortunately, directly inputting temporal-dense event volumes into the spiking
network introduces excessive time steps, resulting in prohibitively high
training costs for large-scale VPR tasks. To address the aforementioned issues,
we propose a novel deep spiking network architecture called Spike-EVPR for
event-based VPR tasks. First, we introduce two novel event representations
tailored for SNN to fully exploit the spatio-temporal information from the
event streams, and reduce the video memory occupation during training as much
as possible. Then, to exploit the full potential of these two representations,
we construct a Bifurcated Spike Residual Encoder (BSR-Encoder) with powerful
representational capabilities to better extract the high-level features from
the two event representations. Next, we introduce a Shared & Specific
Descriptor Extractor (SSD-Extractor). This module is designed to extract
features shared between the two representations and features specific to each.
Finally, we propose a Cross-Descriptor Aggregation Module (CDA-Module) that
fuses the above three features to generate a refined, robust global descriptor
of the scene. Our experimental results indicate the superior performance of our
Spike-EVPR compared to several existing EVPR pipelines on Brisbane-Event-VPR
and DDD20 datasets, with the average Recall@1 increased by 7.61% on Brisbane
and 13.20% on DDD20.
Related papers
- Spiking Neural Network as Adaptive Event Stream Slicer [10.279359105384334]
Event-based cameras provide rich edge information, high dynamic range, and high temporal resolution.
Many state-of-the-art event-based algorithms rely on splitting the events into fixed groups, resulting in the omission of crucial temporal information.
SpikeSlicer is a novel-designed plug-and-play event processing method capable of splitting events stream adaptively.
arXiv Detail & Related papers (2024-10-03T06:41:10Z) - Unified Static and Dynamic Network: Efficient Temporal Filtering for Video Grounding [56.315932539150324]
We design a Unified Static and Dynamic Network (UniSDNet) to learn the semantic association between the video and text/audio queries.
Our UniSDNet is applicable to both Natural Language Video Grounding (NLVG) and Spoken Language Video Grounding (SLVG) tasks.
arXiv Detail & Related papers (2024-03-21T06:53:40Z) - HyperE2VID: Improving Event-Based Video Reconstruction via Hypernetworks [16.432164340779266]
We propose HyperE2VID, a dynamic neural network architecture for event-based video reconstruction.
Our approach uses hypernetworks to generate per-pixel adaptive filters guided by a context fusion module.
arXiv Detail & Related papers (2023-05-10T18:00:06Z) - Spiking-Fer: Spiking Neural Network for Facial Expression Recognition
With Event Cameras [2.9398911304923447]
"Spiking-FER" is a deep convolutional SNN model, and compare it against a similar Artificial Neural Network (ANN)
Experiments show that the proposed approach achieves comparable performance to the ANN architecture, while consuming less energy by orders of magnitude (up to 65.39x)
arXiv Detail & Related papers (2023-04-20T10:59:56Z) - Dual Memory Aggregation Network for Event-Based Object Detection with
Learnable Representation [79.02808071245634]
Event-based cameras are bio-inspired sensors that capture brightness change of every pixel in an asynchronous manner.
Event streams are divided into grids in the x-y-t coordinates for both positive and negative polarity, producing a set of pillars as 3D tensor representation.
Long memory is encoded in the hidden state of adaptive convLSTMs while short memory is modeled by computing spatial-temporal correlation between event pillars.
arXiv Detail & Related papers (2023-03-17T12:12:41Z) - HALSIE: Hybrid Approach to Learning Segmentation by Simultaneously
Exploiting Image and Event Modalities [6.543272301133159]
Event cameras detect changes in per-pixel intensity to generate asynchronous event streams.
They offer great potential for accurate semantic map retrieval in real-time autonomous systems.
Existing implementations for event segmentation suffer from sub-based performance.
We propose hybrid end-to-end learning framework HALSIE to reduce inference cost by up to $20times$ versus art.
arXiv Detail & Related papers (2022-11-19T17:09:50Z) - BiFSMNv2: Pushing Binary Neural Networks for Keyword Spotting to
Real-Network Performance [54.214426436283134]
Deep neural networks, such as the Deep-FSMN, have been widely studied for keyword spotting (KWS) applications.
We present a strong yet efficient binary neural network for KWS, namely BiFSMNv2, pushing it to the real-network accuracy performance.
We highlight that benefiting from the compact architecture and optimized hardware kernel, BiFSMNv2 can achieve an impressive 25.1x speedup and 20.2x storage-saving on edge hardware.
arXiv Detail & Related papers (2022-11-13T18:31:45Z) - Full-Duplex Strategy for Video Object Segmentation [141.43983376262815]
Full- Strategy Network (FSNet) is a novel framework for video object segmentation (VOS)
Our FSNet performs the crossmodal feature-passing (i.e., transmission and receiving) simultaneously before fusion decoding stage.
We show that our FSNet outperforms other state-of-the-arts for both the VOS and video salient object detection tasks.
arXiv Detail & Related papers (2021-08-06T14:50:50Z) - Hybrid-S2S: Video Object Segmentation with Recurrent Networks and
Correspondence Matching [3.9053553775979086]
One-shot Video Object(VOS) is the task of tracking an object of interest within a video sequence.
We study an RNN-based architecture and address some of these issues by proposing a hybrid sequence-to-sequence architecture named HS2S.
Our experiments show that augmenting the RNN with correspondence matching is a highly effective solution to reduce the drift problem.
arXiv Detail & Related papers (2020-10-10T19:00:43Z) - Temporally Distributed Networks for Fast Video Semantic Segmentation [64.5330491940425]
TDNet is a temporally distributed network designed for fast and accurate video semantic segmentation.
We observe that features extracted from a certain high-level layer of a deep CNN can be approximated by composing features extracted from several shallower sub-networks.
Experiments on Cityscapes, CamVid, and NYUD-v2 demonstrate that our method achieves state-of-the-art accuracy with significantly faster speed and lower latency.
arXiv Detail & Related papers (2020-04-03T22:43:32Z) - Real-Time High-Performance Semantic Image Segmentation of Urban Street
Scenes [98.65457534223539]
We propose a real-time high-performance DCNN-based method for robust semantic segmentation of urban street scenes.
The proposed method achieves the accuracy of 73.6% and 68.0% mean Intersection over Union (mIoU) with the inference speed of 51.0 fps and 39.3 fps.
arXiv Detail & Related papers (2020-03-11T08:45:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.