HALSIE: Hybrid Approach to Learning Segmentation by Simultaneously
Exploiting Image and Event Modalities
- URL: http://arxiv.org/abs/2211.10754v4
- Date: Thu, 28 Sep 2023 17:35:10 GMT
- Title: HALSIE: Hybrid Approach to Learning Segmentation by Simultaneously
Exploiting Image and Event Modalities
- Authors: Shristi Das Biswas, Adarsh Kosta, Chamika Liyanagedera, Marco
Apolinario, Kaushik Roy
- Abstract summary: Event cameras detect changes in per-pixel intensity to generate asynchronous event streams.
They offer great potential for accurate semantic map retrieval in real-time autonomous systems.
Existing implementations for event segmentation suffer from sub-based performance.
We propose hybrid end-to-end learning framework HALSIE to reduce inference cost by up to $20times$ versus art.
- Score: 6.543272301133159
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Event cameras detect changes in per-pixel intensity to generate asynchronous
`event streams'. They offer great potential for accurate semantic map retrieval
in real-time autonomous systems owing to their much higher temporal resolution
and high dynamic range (HDR) compared to conventional cameras. However,
existing implementations for event-based segmentation suffer from sub-optimal
performance since these temporally dense events only measure the varying
component of a visual signal, limiting their ability to encode dense spatial
context compared to frames. To address this issue, we propose a hybrid
end-to-end learning framework HALSIE, utilizing three key concepts to reduce
inference cost by up to $20\times$ versus prior art while retaining similar
performance: First, a simple and efficient cross-domain learning scheme to
extract complementary spatio-temporal embeddings from both frames and events.
Second, a specially designed dual-encoder scheme with Spiking Neural Network
(SNN) and Artificial Neural Network (ANN) branches to minimize latency while
retaining cross-domain feature aggregation. Third, a multi-scale cue mixer to
model rich representations of the fused embeddings. These qualities of HALSIE
allow for a very lightweight architecture achieving state-of-the-art
segmentation performance on DDD-17, MVSEC, and DSEC-Semantic datasets with up
to $33\times$ higher parameter efficiency and favorable inference cost (17.9mJ
per cycle). Our ablation study also brings new insights into effective design
choices that can prove beneficial for research across other vision tasks.
Related papers
- Event-Stream Super Resolution using Sigma-Delta Neural Network [0.10923877073891444]
Event cameras present unique challenges due to their low resolution and sparse, asynchronous nature of the data they collect.
Current event super-resolution algorithms are not fully optimized for the distinct data structure produced by event cameras.
Research proposes a method that integrates binary spikes with Sigma Delta Neural Networks (SDNNs)
arXiv Detail & Related papers (2024-08-13T15:25:18Z) - EvGNN: An Event-driven Graph Neural Network Accelerator for Edge Vision [0.06752396542927405]
Event-driven graph neural networks (GNNs) have emerged as a promising solution for sparse event-based vision.
We propose EvGNN, the first event-driven GNN accelerator for low-footprint, ultra-low-latency, and high-accuracy edge vision.
arXiv Detail & Related papers (2024-04-30T12:18:47Z) - Ev-Edge: Efficient Execution of Event-based Vision Algorithms on Commodity Edge Platforms [10.104371980353973]
Ev-Edge is a framework that contains three key optimizations to boost the performance of event-based vision systems on edge platforms.
On several state-of-art networks for a range of autonomous navigation tasks, Ev-Edge achieves 1.28x-2.05x improvements in latency and 1.23x-2.15x in energy.
arXiv Detail & Related papers (2024-03-23T04:44:55Z) - Accurate and Efficient Event-based Semantic Segmentation Using Adaptive Spiking Encoder-Decoder Network [20.05283214295881]
Spiking neural networks (SNNs) are emerging as promising solutions for processing dynamic, asynchronous signals from event-based sensors.
We develop an efficient spiking encoder-decoder network (SpikingEDN) for large-scale event-based semantic segmentation tasks.
We harness the adaptive threshold which improves network accuracy, sparsity and robustness in streaming inference.
arXiv Detail & Related papers (2023-04-24T07:12:50Z) - Hybrid SNN-ANN: Energy-Efficient Classification and Object Detection for
Event-Based Vision [64.71260357476602]
Event-based vision sensors encode local pixel-wise brightness changes in streams of events rather than image frames.
Recent progress in object recognition from event-based sensors has come from conversions of deep neural networks.
We propose a hybrid architecture for end-to-end training of deep neural networks for event-based pattern recognition and object detection.
arXiv Detail & Related papers (2021-12-06T23:45:58Z) - Multi-Exit Semantic Segmentation Networks [78.44441236864057]
We propose a framework for converting state-of-the-art segmentation models to MESS networks.
specially trained CNNs that employ parametrised early exits along their depth to save during inference on easier samples.
We co-optimise the number, placement and architecture of the attached segmentation heads, along with the exit policy, to adapt to the device capabilities and application-specific requirements.
arXiv Detail & Related papers (2021-06-07T11:37:03Z) - Learning Frequency-aware Dynamic Network for Efficient Super-Resolution [56.98668484450857]
This paper explores a novel frequency-aware dynamic network for dividing the input into multiple parts according to its coefficients in the discrete cosine transform (DCT) domain.
In practice, the high-frequency part will be processed using expensive operations and the lower-frequency part is assigned with cheap operations to relieve the computation burden.
Experiments conducted on benchmark SISR models and datasets show that the frequency-aware dynamic network can be employed for various SISR neural architectures.
arXiv Detail & Related papers (2021-03-15T12:54:26Z) - Real-time Semantic Segmentation with Fast Attention [94.88466483540692]
We propose a novel architecture for semantic segmentation of high-resolution images and videos in real-time.
The proposed architecture relies on our fast spatial attention, which is a simple yet efficient modification of the popular self-attention mechanism.
We show that results on multiple datasets demonstrate superior performance with better accuracy and speed compared to existing approaches.
arXiv Detail & Related papers (2020-07-07T22:37:16Z) - Event-based Asynchronous Sparse Convolutional Networks [54.094244806123235]
Event cameras are bio-inspired sensors that respond to per-pixel brightness changes in the form of asynchronous and sparse "events"
We present a general framework for converting models trained on synchronous image-like event representations into asynchronous models with identical output.
We show both theoretically and experimentally that this drastically reduces the computational complexity and latency of high-capacity, synchronous neural networks.
arXiv Detail & Related papers (2020-03-20T08:39:49Z) - Real-Time High-Performance Semantic Image Segmentation of Urban Street
Scenes [98.65457534223539]
We propose a real-time high-performance DCNN-based method for robust semantic segmentation of urban street scenes.
The proposed method achieves the accuracy of 73.6% and 68.0% mean Intersection over Union (mIoU) with the inference speed of 51.0 fps and 39.3 fps.
arXiv Detail & Related papers (2020-03-11T08:45:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.