Hybrid Spiking Vision Transformer for Object Detection with Event Cameras
- URL: http://arxiv.org/abs/2505.07715v1
- Date: Mon, 12 May 2025 16:19:20 GMT
- Title: Hybrid Spiking Vision Transformer for Object Detection with Event Cameras
- Authors: Qi Xu, Jie Deng, Jiangrong Shen, Biwu Chen, Huajin Tang, Gang Pan,
- Abstract summary: Spiking Neural Networks (SNNs) have emerged as a promising approach, offering low energy consumption and rich dynamics.<n>This study proposes a novel hybrid Transformer (HsVT) model to enhance the performance of event-based object detection.<n> Experimental results demonstrate that HsVT achieves significant performance improvements in event detection with fewer parameters.
- Score: 19.967565219584056
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Event-based object detection has gained increasing attention due to its advantages such as high temporal resolution, wide dynamic range, and asynchronous address-event representation. Leveraging these advantages, Spiking Neural Networks (SNNs) have emerged as a promising approach, offering low energy consumption and rich spatiotemporal dynamics. To further enhance the performance of event-based object detection, this study proposes a novel hybrid spike vision Transformer (HsVT) model. The HsVT model integrates a spatial feature extraction module to capture local and global features, and a temporal feature extraction module to model time dependencies and long-term patterns in event sequences. This combination enables HsVT to capture spatiotemporal features, improving its capability to handle complex event-based object detection tasks. To support research in this area, we developed and publicly released The Fall Detection Dataset as a benchmark for event-based object detection tasks. This dataset, captured using an event-based camera, ensures facial privacy protection and reduces memory usage due to the event representation format. We evaluated the HsVT model on GEN1 and Fall Detection datasets across various model sizes. Experimental results demonstrate that HsVT achieves significant performance improvements in event detection with fewer parameters.
Related papers
- Sparse Convolutional Recurrent Learning for Efficient Event-based Neuromorphic Object Detection [4.362139927929203]
We propose the Sparse Event-based Efficient Detector (SEED) for efficient event-based object detection on neuromorphic processors.<n>We introduce sparse convolutional recurrent learning, which achieves over 92% activation sparsity in recurrent processing, vastly reducing the cost for reasoning on sparse event data.
arXiv Detail & Related papers (2025-06-16T12:54:27Z) - Dynamic Graph Induced Contour-aware Heat Conduction Network for Event-based Object Detection [42.021851148914145]
Event-based Vision Sensors (EVS) have demonstrated significant advantages over traditional RGB frame-based cameras in low-light conditions.<n>This paper proposes a novel dynamic graph induced contour-aware heat conduction network for event stream based object detection.
arXiv Detail & Related papers (2025-05-19T09:44:01Z) - Event Stream-based Visual Object Tracking: HDETrack V2 and A High-Definition Benchmark [36.9654606035663]
We introduce a novel hierarchical knowledge distillation strategy to guide the learning of the student Transformer network.<n>We adapt the network model to specific target objects during testing via a newly proposed test-time tuning strategy.<n>We propose EventVOT, the first large-scale high-resolution event-based tracking dataset.
arXiv Detail & Related papers (2025-02-08T13:59:52Z) - DailyDVS-200: A Comprehensive Benchmark Dataset for Event-Based Action Recognition [51.96660522869841]
DailyDVS-200 is a benchmark dataset tailored for the event-based action recognition community.
It covers 200 action categories across real-world scenarios, recorded by 47 participants, and comprises more than 22,000 event sequences.
DailyDVS-200 is annotated with 14 attributes, ensuring a detailed characterization of the recorded actions.
arXiv Detail & Related papers (2024-07-06T15:25:10Z) - EAS-SNN: End-to-End Adaptive Sampling and Representation for Event-based Detection with Recurrent Spiking Neural Networks [14.046487518350792]
Spiking Neural Networks (SNNs) operate on an event-driven through sparse spike communication.
We introduce Residual Potential Dropout (RPD) and Spike-Aware Training (SAT) to regulate potential distribution.
Our method yields a 4.4% mAP improvement on the Gen1 dataset, while requiring 38% fewer parameters and only three time steps.
arXiv Detail & Related papers (2024-03-19T09:34:11Z) - Innovative Horizons in Aerial Imagery: LSKNet Meets DiffusionDet for
Advanced Object Detection [55.2480439325792]
We present an in-depth evaluation of an object detection model that integrates the LSKNet backbone with the DiffusionDet head.
The proposed model achieves a mean average precision (MAP) of approximately 45.7%, which is a significant improvement.
This advancement underscores the effectiveness of the proposed modifications and sets a new benchmark in aerial image analysis.
arXiv Detail & Related papers (2023-11-21T19:49:13Z) - SpikeMOT: Event-based Multi-Object Tracking with Sparse Motion Features [52.213656737672935]
SpikeMOT is an event-based multi-object tracker.
SpikeMOT uses spiking neural networks to extract sparsetemporal features from event streams associated with objects.
arXiv Detail & Related papers (2023-09-29T05:13:43Z) - Spatial-Temporal Graph Enhanced DETR Towards Multi-Frame 3D Object Detection [54.041049052843604]
We present STEMD, a novel end-to-end framework that enhances the DETR-like paradigm for multi-frame 3D object detection.
First, to model the inter-object spatial interaction and complex temporal dependencies, we introduce the spatial-temporal graph attention network.
Finally, it poses a challenge for the network to distinguish between the positive query and other highly similar queries that are not the best match.
arXiv Detail & Related papers (2023-07-01T13:53:14Z) - Dual Memory Aggregation Network for Event-Based Object Detection with
Learnable Representation [79.02808071245634]
Event-based cameras are bio-inspired sensors that capture brightness change of every pixel in an asynchronous manner.
Event streams are divided into grids in the x-y-t coordinates for both positive and negative polarity, producing a set of pillars as 3D tensor representation.
Long memory is encoded in the hidden state of adaptive convLSTMs while short memory is modeled by computing spatial-temporal correlation between event pillars.
arXiv Detail & Related papers (2023-03-17T12:12:41Z) - Event Voxel Set Transformer for Spatiotemporal Representation Learning on Event Streams [19.957857885844838]
Event cameras are neuromorphic vision sensors that record a scene as sparse and asynchronous event streams.
We propose an attentionaware model named Event Voxel Set Transformer (EVSTr) for efficient representation learning on event streams.
Experiments show that EVSTr achieves state-of-the-art performance while maintaining low model complexity.
arXiv Detail & Related papers (2023-03-07T12:48:02Z) - Recurrent Vision Transformers for Object Detection with Event Cameras [62.27246562304705]
We present Recurrent Vision Transformers (RVTs), a novel backbone for object detection with event cameras.
RVTs can be trained from scratch to reach state-of-the-art performance on event-based object detection.
Our study brings new insights into effective design choices that can be fruitful for research beyond event-based vision.
arXiv Detail & Related papers (2022-12-11T20:28:59Z) - Progressive Self-Guided Loss for Salient Object Detection [102.35488902433896]
We present a progressive self-guided loss function to facilitate deep learning-based salient object detection in images.
Our framework takes advantage of adaptively aggregated multi-scale features to locate and detect salient objects effectively.
arXiv Detail & Related papers (2021-01-07T07:33:38Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.