MCU-MixQ: A HW/SW Co-optimized Mixed-precision Neural Network Design Framework for MCUs
- URL: http://arxiv.org/abs/2407.18267v1
- Date: Wed, 17 Jul 2024 14:51:15 GMT
- Title: MCU-MixQ: A HW/SW Co-optimized Mixed-precision Neural Network Design Framework for MCUs
- Authors: Junfeng Gong, Cheng Liu, Long Cheng, Huawei Li, Xiaowei Li,
- Abstract summary: Mixed-precision neural network (MPNN) that utilizes just enough data width for the neural network processing is an effective approach to meet the stringent resources constraints.
However, there is still a lack of sub-byte and mixed-precision SIMD operations in MCU-class ISA.
In this work, we propose to pack multiple low-bitwidth arithmetic operations within a single instruction multiple data (SIMD) instructions in typical MCUs.
- Score: 9.719789698194154
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: Mixed-precision neural network (MPNN) that utilizes just enough data width for the neural network processing is an effective approach to meet the stringent resources constraints including memory and computing of MCUs. Nevertheless, there is still a lack of sub-byte and mixed-precision SIMD operations in MCU-class ISA and the limited computing capability of MCUs remains underutilized, which further aggravates the computing bound encountered in neural network processing. As a result, the benefits of MPNNs cannot be fully unleashed. In this work, we propose to pack multiple low-bitwidth arithmetic operations within a single instruction multiple data (SIMD) instructions in typical MCUs, and then develop an efficient convolution operator by exploring both the data parallelism and computing parallelism in convolution along with the proposed SIMD packing. Finally, we further leverage Neural Architecture Search (NAS) to build a HW/SW co-designed MPNN design framework, namely MCU-MixQ. This framework can optimize both the MPNN quantization and MPNN implementation efficiency, striking an optimized balance between neural network performance and accuracy. According to our experiment results, MCU-MixQ achieves 2.1$\times$ and 1.4$\times$ speedup over CMix-NN and MCUNet respectively under the same resource constraints.
Related papers
- Scalable Mechanistic Neural Networks [52.28945097811129]
We propose an enhanced neural network framework designed for scientific machine learning applications involving long temporal sequences.
By reformulating the original Mechanistic Neural Network (MNN) we reduce the computational time and space complexities from cubic and quadratic with respect to the sequence length, respectively, to linear.
Extensive experiments demonstrate that S-MNN matches the original MNN in precision while substantially reducing computational resources.
arXiv Detail & Related papers (2024-10-08T14:27:28Z) - Enhancing MOTION2NX for Efficient, Scalable and Secure Image Inference using Convolutional Neural Networks [4.407841002228536]
We use the ABY2.0 SMPC protocol implemented on the C++ based MOTION2NX framework for secure convolutional neural network (CNN) inference application with semi-honest security.
We also present a novel splitting algorithm that divides the computations at each CNN layer into multiple chunks.
arXiv Detail & Related papers (2024-08-29T09:50:21Z) - Parallel Proportional Fusion of Spiking Quantum Neural Network for Optimizing Image Classification [10.069224006497162]
We introduce a novel architecture termed Parallel Proportional Fusion of Quantum and Spiking Neural Networks (PPF-QSNN)
The proposed PPF-QSNN outperforms both the existing spiking neural network and the serial quantum neural network across metrics such as accuracy, loss, and robustness.
This study lays the groundwork for the advancement and application of quantum advantage in artificial intelligent computations.
arXiv Detail & Related papers (2024-04-01T10:35:35Z) - LitE-SNN: Designing Lightweight and Efficient Spiking Neural Network through Spatial-Temporal Compressive Network Search and Joint Optimization [48.41286573672824]
Spiking Neural Networks (SNNs) mimic the information-processing mechanisms of the human brain and are highly energy-efficient.
We propose a new approach named LitE-SNN that incorporates both spatial and temporal compression into the automated network design process.
arXiv Detail & Related papers (2024-01-26T05:23:11Z) - A Multi-Head Ensemble Multi-Task Learning Approach for Dynamical
Computation Offloading [62.34538208323411]
We propose a multi-head ensemble multi-task learning (MEMTL) approach with a shared backbone and multiple prediction heads (PHs)
MEMTL outperforms benchmark methods in both the inference accuracy and mean square error without requiring additional training data.
arXiv Detail & Related papers (2023-09-02T11:01:16Z) - Intelligence Processing Units Accelerate Neuromorphic Learning [52.952192990802345]
Spiking neural networks (SNNs) have achieved orders of magnitude improvement in terms of energy consumption and latency.
We present an IPU-optimized release of our custom SNN Python package, snnTorch.
arXiv Detail & Related papers (2022-11-19T15:44:08Z) - Scalable Nanophotonic-Electronic Spiking Neural Networks [3.9918594409417576]
Spiking neural networks (SNN) provide a new computational paradigm capable of highly parallelized, real-time processing.
Photonic devices are ideal for the design of high-bandwidth, parallel architectures matching the SNN computational paradigm.
Co-integrated CMOS and SiPh technologies are well-suited to the design of scalable SNN computing architectures.
arXiv Detail & Related papers (2022-08-28T06:10:06Z) - An Adaptive Device-Edge Co-Inference Framework Based on Soft
Actor-Critic [72.35307086274912]
High-dimension parameter model and large-scale mathematical calculation restrict execution efficiency, especially for Internet of Things (IoT) devices.
We propose a new Deep Reinforcement Learning (DRL)-Soft Actor Critic for discrete (SAC-d), which generates the emphexit point, emphexit point, and emphcompressing bits by soft policy iterations.
Based on the latency and accuracy aware reward design, such an computation can well adapt to the complex environment like dynamic wireless channel and arbitrary processing, and is capable of supporting the 5G URL
arXiv Detail & Related papers (2022-01-09T09:31:50Z) - Quantized Neural Networks via {-1, +1} Encoding Decomposition and
Acceleration [83.84684675841167]
We propose a novel encoding scheme using -1, +1 to decompose quantized neural networks (QNNs) into multi-branch binary networks.
We validate the effectiveness of our method on large-scale image classification, object detection, and semantic segmentation tasks.
arXiv Detail & Related papers (2021-06-18T03:11:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.