Sound Event Detection with Binary Neural Networks on Tightly
Power-Constrained IoT Devices
- URL: http://arxiv.org/abs/2101.04446v1
- Date: Tue, 12 Jan 2021 12:38:23 GMT
- Title: Sound Event Detection with Binary Neural Networks on Tightly
Power-Constrained IoT Devices
- Authors: Gianmarco Cerutti, Renzo Andri, Lukas Cavigelli, Michele Magno,
Elisabetta Farella, Luca Benini
- Abstract summary: Sound event detection (SED) is a hot topic in consumer and smart city applications.
Existing approaches based on Deep Neural Networks are very effective, but highly demanding in terms of memory, power, and throughput.
In this paper, we explore the combination of extreme quantization to a small-print binary neural network (BNN) with the highly energy-efficient, RISC-V-based (8+1)-core GAP8 microcontroller.
- Score: 20.349809458335532
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Sound event detection (SED) is a hot topic in consumer and smart city
applications. Existing approaches based on Deep Neural Networks are very
effective, but highly demanding in terms of memory, power, and throughput when
targeting ultra-low power always-on devices.
Latency, availability, cost, and privacy requirements are pushing recent IoT
systems to process the data on the node, close to the sensor, with a very
limited energy supply, and tight constraints on the memory size and processing
capabilities precluding to run state-of-the-art DNNs.
In this paper, we explore the combination of extreme quantization to a
small-footprint binary neural network (BNN) with the highly energy-efficient,
RISC-V-based (8+1)-core GAP8 microcontroller. Starting from an existing CNN for
SED whose footprint (815 kB) exceeds the 512 kB of memory available on our
platform, we retrain the network using binary filters and activations to match
these memory constraints. (Fully) binary neural networks come with a natural
drop in accuracy of 12-18% on the challenging ImageNet object recognition
challenge compared to their equivalent full-precision baselines. This BNN
reaches a 77.9% accuracy, just 7% lower than the full-precision version, with
58 kB (7.2 times less) for the weights and 262 kB (2.4 times less) memory in
total. With our BNN implementation, we reach a peak throughput of 4.6 GMAC/s
and 1.5 GMAC/s over the full network, including preprocessing with Mel bins,
which corresponds to an efficiency of 67.1 GMAC/s/W and 31.3 GMAC/s/W,
respectively. Compared to the performance of an ARM Cortex-M4 implementation,
our system has a 10.3 times faster execution time and a 51.1 times higher
energy-efficiency.
Related papers
- Spiker+: a framework for the generation of efficient Spiking Neural
Networks FPGA accelerators for inference at the edge [49.42371633618761]
Spiker+ is a framework for generating efficient, low-power, and low-area customized Spiking Neural Networks (SNN) accelerators on FPGA for inference at the edge.
Spiker+ is tested on two benchmark datasets, the MNIST and the Spiking Heidelberg Digits (SHD)
arXiv Detail & Related papers (2024-01-02T10:42:42Z) - Sparse Compressed Spiking Neural Network Accelerator for Object
Detection [0.1246030133914898]
Spiking neural networks (SNNs) are inspired by the human brain and transmit binary spikes and highly sparse activation maps.
This paper proposes a sparse compressed spiking neural network accelerator that takes advantage of the high sparsity of activation maps and weights.
The experimental result of the neural network shows 71.5$%$ mAP with mixed (1,3) time steps on the IVS 3cls dataset.
arXiv Detail & Related papers (2022-05-02T09:56:55Z) - BottleFit: Learning Compressed Representations in Deep Neural Networks
for Effective and Efficient Split Computing [48.11023234245863]
We propose a new framework called BottleFit, which includes a novel training strategy to achieve high accuracy even with strong compression rates.
BottleFit achieves 77.1% data compression with up to 0.6% accuracy loss on ImageNet dataset.
We show that BottleFit decreases power consumption and latency respectively by up to 49% and 89% with respect to (w.r.t.) local computing and by 37% and 55% w.r.t. edge offloading.
arXiv Detail & Related papers (2022-01-07T22:08:07Z) - MCUNetV2: Memory-Efficient Patch-based Inference for Tiny Deep Learning [72.80896338009579]
We find that the memory bottleneck is due to the imbalanced memory distribution in convolutional neural network (CNN) designs.
We propose a generic patch-by-patch inference scheduling, which significantly cuts down the peak memory.
We automate the process with neural architecture search to jointly optimize the neural architecture and inference scheduling, leading to MCUNetV2.
arXiv Detail & Related papers (2021-10-28T17:58:45Z) - Quantized Neural Networks via {-1, +1} Encoding Decomposition and
Acceleration [83.84684675841167]
We propose a novel encoding scheme using -1, +1 to decompose quantized neural networks (QNNs) into multi-branch binary networks.
We validate the effectiveness of our method on large-scale image classification, object detection, and semantic segmentation tasks.
arXiv Detail & Related papers (2021-06-18T03:11:15Z) - DistGNN: Scalable Distributed Training for Large-Scale Graph Neural
Networks [58.48833325238537]
Full-batch training on Graph Neural Networks (GNN) to learn the structure of large graphs is a critical problem that needs to scale to hundreds of compute nodes to be feasible.
In this paper, we presentGNN that optimize the well-known Deep Graph Library (DGL) for full-batch training on CPU clusters.
Our results on four common GNN benchmark datasets show up to 3.7x speed-up using a single CPU socket and up to 97x speed-up using 128 CPU sockets.
arXiv Detail & Related papers (2021-04-14T08:46:35Z) - Fast Implementation of 4-bit Convolutional Neural Networks for Mobile
Devices [0.8362190332905524]
We show an efficient implementation of 4-bit matrix multiplication for quantized neural networks.
We also demonstrate a 4-bit quantized neural network for OCR recognition on the MIDV-500 dataset.
The results show that 4-bit quantization perfectly suits mobile devices, yielding good enough accuracy and low inference time.
arXiv Detail & Related papers (2020-09-14T14:48:40Z) - TinyRadarNN: Combining Spatial and Temporal Convolutional Neural
Networks for Embedded Gesture Recognition with Short Range Radars [13.266626571886354]
This work proposes a low-power high-accuracy embedded hand-gesture recognition algorithm targeting battery-operated wearable devices.
A 2D Convolutional Neural Network (CNN) using range frequency Doppler features is combined with a Temporal Convolutional Neural Network (TCN) for time sequence prediction.
arXiv Detail & Related papers (2020-06-25T15:23:21Z) - Q-EEGNet: an Energy-Efficient 8-bit Quantized Parallel EEGNet
Implementation for Edge Motor-Imagery Brain--Machine Interfaces [16.381467082472515]
Motor-Imagery Brain--Machine Interfaces (MI-BMIs)promise direct and accessible communication between human brains and machines.
Deep learning models have emerged for classifying EEG signals.
These models often exceed the limitations of edge devices due to their memory and computational requirements.
arXiv Detail & Related papers (2020-04-24T12:29:03Z) - Near-chip Dynamic Vision Filtering for Low-Bandwidth Pedestrian
Detection [99.94079901071163]
This paper presents a novel end-to-end system for pedestrian detection using Dynamic Vision Sensors (DVSs)
We target applications where multiple sensors transmit data to a local processing unit, which executes a detection algorithm.
Our detector is able to perform a detection every 450 ms, with an overall testing F1 score of 83%.
arXiv Detail & Related papers (2020-04-03T17:36:26Z) - Compact recurrent neural networks for acoustic event detection on
low-energy low-complexity platforms [10.04812789957562]
This paper addresses the application of sound event detection at the edge, by optimizing deep learning techniques on resource-constrained embedded platforms for the IoT.
A two-stage student-teacher approach is presented to make state-of-the-art neural networks for sound event detection fit on current microcontrollers.
Our embedded implementation can achieve 68% accuracy in recognition on Urbansound8k, not far from state-of-the-art performance.
arXiv Detail & Related papers (2020-01-29T14:56:52Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.