HDI-Former: Hybrid Dynamic Interaction ANN-SNN Transformer for Object Detection Using Frames and Events
- URL: http://arxiv.org/abs/2411.18658v1
- Date: Wed, 27 Nov 2024 09:32:41 GMT
- Title: HDI-Former: Hybrid Dynamic Interaction ANN-SNN Transformer for Object Detection Using Frames and Events
- Authors: Dianze Li, Jianing Li, Xu Liu, Zhaokun Zhou, Xiaopeng Fan, Yonghong Tian,
- Abstract summary: HDI-Former is a Hybrid Dynamic Interaction ANN-SNN Transformer for high-accuracy and energy-efficient object detection.
We first present a novel semantic-enhanced self-attention mechanism that strengthens the correlation between image encoding tokens within the ANN Transformer branch.
We then design a Spiking Swin Transformer branch to model temporal cues from event streams with low power consumption.
- Score: 44.20745133222306
- License:
- Abstract: Combining the complementary benefits of frames and events has been widely used for object detection in challenging scenarios. However, most object detection methods use two independent Artificial Neural Network (ANN) branches, limiting cross-modality information interaction across the two visual streams and encountering challenges in extracting temporal cues from event streams with low power consumption. To address these challenges, we propose HDI-Former, a Hybrid Dynamic Interaction ANN-SNN Transformer, marking the first trial to design a directly trained hybrid ANN-SNN architecture for high-accuracy and energy-efficient object detection using frames and events. Technically, we first present a novel semantic-enhanced self-attention mechanism that strengthens the correlation between image encoding tokens within the ANN Transformer branch for better performance. Then, we design a Spiking Swin Transformer branch to model temporal cues from event streams with low power consumption. Finally, we propose a bio-inspired dynamic interaction mechanism between ANN and SNN sub-networks for cross-modality information interaction. The results demonstrate that our HDI-Former outperforms eleven state-of-the-art methods and our four baselines by a large margin. Our SNN branch also shows comparable performance to the ANN with the same architecture while consuming 10.57$\times$ less energy on the DSEC-Detection dataset. Our open-source code is available in the supplementary material.
Related papers
- Neuromorphic Wireless Split Computing with Multi-Level Spikes [69.73249913506042]
Neuromorphic computing uses spiking neural networks (SNNs) to perform inference tasks.
embedding a small payload within each spike exchanged between spiking neurons can enhance inference accuracy without increasing energy consumption.
split computing - where an SNN is partitioned across two devices - is a promising solution.
This paper presents the first comprehensive study of a neuromorphic wireless split computing architecture that employs multi-level SNNs.
arXiv Detail & Related papers (2024-11-07T14:08:35Z) - TCCT-Net: Two-Stream Network Architecture for Fast and Efficient Engagement Estimation via Behavioral Feature Signals [58.865901821451295]
We present a novel two-stream feature fusion "Tensor-Convolution and Convolution-Transformer Network" (TCCT-Net) architecture.
To better learn the meaningful patterns in the temporal-spatial domain, we design a "CT" stream that integrates a hybrid convolutional-transformer.
In parallel, to efficiently extract rich patterns from the temporal-frequency domain, we introduce a "TC" stream that uses Continuous Wavelet Transform (CWT) to represent information in a 2D tensor form.
arXiv Detail & Related papers (2024-04-15T06:01:48Z) - A Hybrid SNN-ANN Network for Event-based Object Detection with Spatial and Temporal Attention [2.5075774828443467]
Event cameras offer high temporal resolution and dynamic range with minimal motion blur, making them promising for object detection tasks.
While Spiking Neural Networks (SNNs) are a natural match for event-based sensory data, Artificial Neural Networks (ANNs) tend to display more stable training dynamics.
We introduce the first Hybrid Attention-based SNN-ANN backbone for object detection using event cameras.
arXiv Detail & Related papers (2024-03-15T10:28:31Z) - Point-aware Interaction and CNN-induced Refinement Network for RGB-D Salient Object Detection [95.84616822805664]
We introduce CNNs-assisted Transformer architecture and propose a novel RGB-D SOD network with Point-aware Interaction and CNN-induced Refinement.
In order to alleviate the block effect and detail destruction problems brought by the Transformer naturally, we design a CNN-induced refinement (CNNR) unit for content refinement and supplementation.
arXiv Detail & Related papers (2023-08-17T11:57:49Z) - Best of Both Worlds: Hybrid SNN-ANN Architecture for Event-based Optical Flow Estimation [12.611797572621398]
Spiking Neural Networks (SNNs) with their asynchronous event-driven compute show great potential for extracting features from event streams.
We propose a novel SNN-ANN hybrid architecture that combines the strengths of both.
arXiv Detail & Related papers (2023-06-05T15:26:02Z) - Spiking-Fer: Spiking Neural Network for Facial Expression Recognition
With Event Cameras [2.9398911304923447]
"Spiking-FER" is a deep convolutional SNN model, and compare it against a similar Artificial Neural Network (ANN)
Experiments show that the proposed approach achieves comparable performance to the ANN architecture, while consuming less energy by orders of magnitude (up to 65.39x)
arXiv Detail & Related papers (2023-04-20T10:59:56Z) - Rich CNN-Transformer Feature Aggregation Networks for Super-Resolution [50.10987776141901]
Recent vision transformers along with self-attention have achieved promising results on various computer vision tasks.
We introduce an effective hybrid architecture for super-resolution (SR) tasks, which leverages local features from CNNs and long-range dependencies captured by transformers.
Our proposed method achieves state-of-the-art SR results on numerous benchmark datasets.
arXiv Detail & Related papers (2022-03-15T06:52:25Z) - Hybrid SNN-ANN: Energy-Efficient Classification and Object Detection for
Event-Based Vision [64.71260357476602]
Event-based vision sensors encode local pixel-wise brightness changes in streams of events rather than image frames.
Recent progress in object recognition from event-based sensors has come from conversions of deep neural networks.
We propose a hybrid architecture for end-to-end training of deep neural networks for event-based pattern recognition and object detection.
arXiv Detail & Related papers (2021-12-06T23:45:58Z) - Beyond Classification: Directly Training Spiking Neural Networks for
Semantic Segmentation [5.800785186389827]
Spiking Neural Networks (SNNs) have emerged as the low-power alternative to Artificial Neural Networks (ANNs)
In this paper, we explore the SNN applications beyond classification and present semantic segmentation networks configured with spiking neurons.
arXiv Detail & Related papers (2021-10-14T21:53:03Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.