Related papers: Q-EEGNet: an Energy-Efficient 8-bit Quantized Parallel EEGNet Implementation for Edge Motor-Imagery Brain--Machine Interfaces

Q-EEGNet: an Energy-Efficient 8-bit Quantized Parallel EEGNet Implementation for Edge Motor-Imagery Brain--Machine Interfaces

URL: http://arxiv.org/abs/2004.11690v2
Date: Sat, 21 Nov 2020 11:11:03 GMT
Title: Q-EEGNet: an Energy-Efficient 8-bit Quantized Parallel EEGNet Implementation for Edge Motor-Imagery Brain--Machine Interfaces
Authors: Tibor Schneider, Xiaying Wang, Michael Hersche, Lukas Cavigelli, Luca Benini
Abstract summary: Motor-Imagery Brain--Machine Interfaces (MI-BMIs)promise direct and accessible communication between human brains and machines. Deep learning models have emerged for classifying EEG signals. These models often exceed the limitations of edge devices due to their memory and computational requirements.
Score: 16.381467082472515
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Motor-Imagery Brain--Machine Interfaces (MI-BMIs)promise direct and accessible communication between human brains and machines by analyzing brain activities recorded with Electroencephalography (EEG). Latency, reliability, and privacy constraints make it unsuitable to offload the computation to the cloud. Practical use cases demand a wearable, battery-operated device with low average power consumption for long-term use. Recently, sophisticated algorithms, in particular deep learning models, have emerged for classifying EEG signals. While reaching outstanding accuracy, these models often exceed the limitations of edge devices due to their memory and computational requirements. In this paper, we demonstrate algorithmic and implementation optimizations for EEGNET, a compact Convolutional Neural Network (CNN) suitable for many BMI paradigms. We quantize weights and activations to 8-bit fixed-point with a negligible accuracy loss of 0.4% on 4-class MI, and present an energy-efficient hardware-aware implementation on the Mr.Wolf parallel ultra-low power (PULP) System-on-Chip (SoC) by utilizing its custom RISC-V ISA extensions and 8-core compute cluster. With our proposed optimization steps, we can obtain an overall speedup of 64x and a reduction of up to 85% in memory footprint with respect to a single-core layer-wise baseline implementation. Our implementation takes only 5.82 ms and consumes 0.627 mJ per inference. With 21.0GMAC/s/W, it is 256x more energy-efficient than an EEGNET implementation on an ARM Cortex-M7 (0.082GMAC/s/W).

Related papers

Dynamic neural network with memristive CIM and CAM for 2D and 3D vision [57.6208980140268]
We propose a semantic memory-based dynamic neural network (DNN) using memristor. The network associates incoming data with the past experience stored as semantic vectors. We validate our co-designs, using a 40nm memristor macro, on ResNet and PointNet++ for classifying images and 3D points from the MNIST and ModelNet datasets.
arXiv Detail & Related papers (2024-07-12T04:55:57Z)
A Precision-Optimized Fixed-Point Near-Memory Digital Processing Unit for Analog In-Memory Computing [10.992736723518036]
We propose a Near-Memory digital Processing Unit (NMPU) based on fixed-point arithmetic. It achieves competitive accuracy and higher computing throughput than previous approaches. We validate the efficacy of the NMPU by using data from an AIMC chip and demonstrate that a simulated AIMC system with the proposed NMPU outperforms existing FP16-based implementations.
arXiv Detail & Related papers (2024-02-12T10:30:45Z)
Pruning random resistive memory for optimizing analogue AI [54.21621702814583]
AI models present unprecedented challenges to energy consumption and environmental sustainability. One promising solution is to revisit analogue computing, a technique that predates digital computing. Here, we report a universal solution, software-hardware co-design using structural plasticity-inspired edge pruning.
arXiv Detail & Related papers (2023-11-13T08:59:01Z)
DeepGEMM: Accelerated Ultra Low-Precision Inference on CPU Architectures using Lookup Tables [49.965024476651706]
DeepGEMM is a lookup table based approach for the execution of ultra low-precision convolutional neural networks on SIMD hardware. Our implementation outperforms corresponding 8-bit integer kernels by up to 1.74x on x86 platforms.
arXiv Detail & Related papers (2023-04-18T15:13:10Z)
FPGA-optimized Hardware acceleration for Spiking Neural Networks [69.49429223251178]
This work presents the development of a hardware accelerator for an SNN, with off-line training, applied to an image recognition task. The design targets a Xilinx Artix-7 FPGA, using in total around the 40% of the available hardware resources. It reduces the classification time by three orders of magnitude, with a small 4.5% impact on the accuracy, if compared to its software, full precision counterpart.
arXiv Detail & Related papers (2022-01-18T13:59:22Z)
An Adaptive Device-Edge Co-Inference Framework Based on Soft Actor-Critic [72.35307086274912]
High-dimension parameter model and large-scale mathematical calculation restrict execution efficiency, especially for Internet of Things (IoT) devices. We propose a new Deep Reinforcement Learning (DRL)-Soft Actor Critic for discrete (SAC-d), which generates the emphexit point, emphexit point, and emphcompressing bits by soft policy iterations. Based on the latency and accuracy aware reward design, such an computation can well adapt to the complex environment like dynamic wireless channel and arbitrary processing, and is capable of supporting the 5G URL
arXiv Detail & Related papers (2022-01-09T09:31:50Z)
Vega: A 10-Core SoC for IoT End-Nodes with DNN Acceleration and Cognitive Wake-Up From MRAM-Based State-Retentive Sleep Mode [14.214500730272256]
Vega is an IoT end-node system capable of scaling from a 1.7 $mathrmmuW fully retentive cognitive sleep mode up to 32.2 GOPS (@ 49.4 mW) peak on NSAAs. Vega achieves SoA-leading efficiency of 615 GOPS/W on 8-bit INT and 79 and 129 GFLOPS/W on 32- and 16-bit FP.
arXiv Detail & Related papers (2021-10-18T08:47:45Z)
PhiNets: a scalable backbone for low-power AI at the edge [2.7910505923792646]
We present PhiNets, a new scalable backbone optimized for deep-learning-based image processing on resource-constrained platforms. PhiNets are based on inverted residual blocks specifically designed to decouple the computational cost, working memory, and parameter memory. We demonstrate our approach on a prototype node based on a STM32H743 microcontroller.
arXiv Detail & Related papers (2021-10-01T12:03:25Z)
Quantized Neural Networks via {-1, +1} Encoding Decomposition and Acceleration [83.84684675841167]
We propose a novel encoding scheme using -1, +1 to decompose quantized neural networks (QNNs) into multi-branch binary networks. We validate the effectiveness of our method on large-scale image classification, object detection, and semantic segmentation tasks.
arXiv Detail & Related papers (2021-06-18T03:11:15Z)
Sound Event Detection with Binary Neural Networks on Tightly Power-Constrained IoT Devices [20.349809458335532]
Sound event detection (SED) is a hot topic in consumer and smart city applications. Existing approaches based on Deep Neural Networks are very effective, but highly demanding in terms of memory, power, and throughput. In this paper, we explore the combination of extreme quantization to a small-print binary neural network (BNN) with the highly energy-efficient, RISC-V-based (8+1)-core GAP8 microcontroller.
arXiv Detail & Related papers (2021-01-12T12:38:23Z)
EEG-TCNet: An Accurate Temporal Convolutional Network for Embedded Motor-Imagery Brain-Machine Interfaces [15.07343602952606]
We propose EEG-TCNet, a novel temporal convolutional network (TCN) that achieves outstanding accuracy while requiring few trainable parameters. Its low memory footprint and low computational complexity for inference make it suitable for embedded classification on resource-limited devices at the edge.
arXiv Detail & Related papers (2020-05-31T21:45:45Z)
An Accurate EEGNet-based Motor-Imagery Brain-Computer Interface for Low-Power Edge Computing [13.266626571886354]
This paper presents an accurate and robust embedded motor-imagery brain-computer interface (MI-BCI) The proposed novel model, based on EEGNet, matches the requirements of memory footprint and computational resources of low-power microcontroller units (MCUs) The scaled models are deployed on a commercial Cortex-M4F MCU taking 101ms and consuming 4.28mJ per inference for operating the smallest model, and on a Cortex-M7 with 44ms and 18.1mJ per inference for the medium-sized model.
arXiv Detail & Related papers (2020-03-31T19:52:05Z)

This list is automatically generated from the titles and abstracts of the papers in this site.