Q-EEGNet: an Energy-Efficient 8-bit Quantized Parallel EEGNet
Implementation for Edge Motor-Imagery Brain--Machine Interfaces
- URL: http://arxiv.org/abs/2004.11690v2
- Date: Sat, 21 Nov 2020 11:11:03 GMT
- Title: Q-EEGNet: an Energy-Efficient 8-bit Quantized Parallel EEGNet
Implementation for Edge Motor-Imagery Brain--Machine Interfaces
- Authors: Tibor Schneider, Xiaying Wang, Michael Hersche, Lukas Cavigelli, Luca
Benini
- Abstract summary: Motor-Imagery Brain--Machine Interfaces (MI-BMIs)promise direct and accessible communication between human brains and machines.
Deep learning models have emerged for classifying EEG signals.
These models often exceed the limitations of edge devices due to their memory and computational requirements.
- Score: 16.381467082472515
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Motor-Imagery Brain--Machine Interfaces (MI-BMIs)promise direct and
accessible communication between human brains and machines by analyzing brain
activities recorded with Electroencephalography (EEG). Latency, reliability,
and privacy constraints make it unsuitable to offload the computation to the
cloud. Practical use cases demand a wearable, battery-operated device with low
average power consumption for long-term use. Recently, sophisticated
algorithms, in particular deep learning models, have emerged for classifying
EEG signals. While reaching outstanding accuracy, these models often exceed the
limitations of edge devices due to their memory and computational requirements.
In this paper, we demonstrate algorithmic and implementation optimizations for
EEGNET, a compact Convolutional Neural Network (CNN) suitable for many BMI
paradigms. We quantize weights and activations to 8-bit fixed-point with a
negligible accuracy loss of 0.4% on 4-class MI, and present an energy-efficient
hardware-aware implementation on the Mr.Wolf parallel ultra-low power (PULP)
System-on-Chip (SoC) by utilizing its custom RISC-V ISA extensions and 8-core
compute cluster. With our proposed optimization steps, we can obtain an overall
speedup of 64x and a reduction of up to 85% in memory footprint with respect to
a single-core layer-wise baseline implementation. Our implementation takes only
5.82 ms and consumes 0.627 mJ per inference. With 21.0GMAC/s/W, it is 256x more
energy-efficient than an EEGNET implementation on an ARM Cortex-M7
(0.082GMAC/s/W).
Related papers
- Dynamic neural network with memristive CIM and CAM for 2D and 3D vision [57.6208980140268]
We propose a semantic memory-based dynamic neural network (DNN) using memristor.
The network associates incoming data with the past experience stored as semantic vectors.
We validate our co-designs, using a 40nm memristor macro, on ResNet and PointNet++ for classifying images and 3D points from the MNIST and ModelNet datasets.
arXiv Detail & Related papers (2024-07-12T04:55:57Z) - A Precision-Optimized Fixed-Point Near-Memory Digital Processing Unit
for Analog In-Memory Computing [10.992736723518036]
We propose a Near-Memory digital Processing Unit (NMPU) based on fixed-point arithmetic.
It achieves competitive accuracy and higher computing throughput than previous approaches.
We validate the efficacy of the NMPU by using data from an AIMC chip and demonstrate that a simulated AIMC system with the proposed NMPU outperforms existing FP16-based implementations.
arXiv Detail & Related papers (2024-02-12T10:30:45Z) - Pruning random resistive memory for optimizing analogue AI [54.21621702814583]
AI models present unprecedented challenges to energy consumption and environmental sustainability.
One promising solution is to revisit analogue computing, a technique that predates digital computing.
Here, we report a universal solution, software-hardware co-design using structural plasticity-inspired edge pruning.
arXiv Detail & Related papers (2023-11-13T08:59:01Z) - DeepGEMM: Accelerated Ultra Low-Precision Inference on CPU Architectures
using Lookup Tables [49.965024476651706]
DeepGEMM is a lookup table based approach for the execution of ultra low-precision convolutional neural networks on SIMD hardware.
Our implementation outperforms corresponding 8-bit integer kernels by up to 1.74x on x86 platforms.
arXiv Detail & Related papers (2023-04-18T15:13:10Z) - FPGA-optimized Hardware acceleration for Spiking Neural Networks [69.49429223251178]
This work presents the development of a hardware accelerator for an SNN, with off-line training, applied to an image recognition task.
The design targets a Xilinx Artix-7 FPGA, using in total around the 40% of the available hardware resources.
It reduces the classification time by three orders of magnitude, with a small 4.5% impact on the accuracy, if compared to its software, full precision counterpart.
arXiv Detail & Related papers (2022-01-18T13:59:22Z) - An Adaptive Device-Edge Co-Inference Framework Based on Soft
Actor-Critic [72.35307086274912]
High-dimension parameter model and large-scale mathematical calculation restrict execution efficiency, especially for Internet of Things (IoT) devices.
We propose a new Deep Reinforcement Learning (DRL)-Soft Actor Critic for discrete (SAC-d), which generates the emphexit point, emphexit point, and emphcompressing bits by soft policy iterations.
Based on the latency and accuracy aware reward design, such an computation can well adapt to the complex environment like dynamic wireless channel and arbitrary processing, and is capable of supporting the 5G URL
arXiv Detail & Related papers (2022-01-09T09:31:50Z) - Vega: A 10-Core SoC for IoT End-Nodes with DNN Acceleration and
Cognitive Wake-Up From MRAM-Based State-Retentive Sleep Mode [14.214500730272256]
Vega is an IoT end-node system capable of scaling from a 1.7 $mathrmmuW fully retentive cognitive sleep mode up to 32.2 GOPS (@ 49.4 mW) peak on NSAAs.
Vega achieves SoA-leading efficiency of 615 GOPS/W on 8-bit INT and 79 and 129 GFLOPS/W on 32- and 16-bit FP.
arXiv Detail & Related papers (2021-10-18T08:47:45Z) - PhiNets: a scalable backbone for low-power AI at the edge [2.7910505923792646]
We present PhiNets, a new scalable backbone optimized for deep-learning-based image processing on resource-constrained platforms.
PhiNets are based on inverted residual blocks specifically designed to decouple the computational cost, working memory, and parameter memory.
We demonstrate our approach on a prototype node based on a STM32H743 microcontroller.
arXiv Detail & Related papers (2021-10-01T12:03:25Z) - Quantized Neural Networks via {-1, +1} Encoding Decomposition and
Acceleration [83.84684675841167]
We propose a novel encoding scheme using -1, +1 to decompose quantized neural networks (QNNs) into multi-branch binary networks.
We validate the effectiveness of our method on large-scale image classification, object detection, and semantic segmentation tasks.
arXiv Detail & Related papers (2021-06-18T03:11:15Z) - Sound Event Detection with Binary Neural Networks on Tightly
Power-Constrained IoT Devices [20.349809458335532]
Sound event detection (SED) is a hot topic in consumer and smart city applications.
Existing approaches based on Deep Neural Networks are very effective, but highly demanding in terms of memory, power, and throughput.
In this paper, we explore the combination of extreme quantization to a small-print binary neural network (BNN) with the highly energy-efficient, RISC-V-based (8+1)-core GAP8 microcontroller.
arXiv Detail & Related papers (2021-01-12T12:38:23Z) - EEG-TCNet: An Accurate Temporal Convolutional Network for Embedded
Motor-Imagery Brain-Machine Interfaces [15.07343602952606]
We propose EEG-TCNet, a novel temporal convolutional network (TCN) that achieves outstanding accuracy while requiring few trainable parameters.
Its low memory footprint and low computational complexity for inference make it suitable for embedded classification on resource-limited devices at the edge.
arXiv Detail & Related papers (2020-05-31T21:45:45Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.