msf-CNN: Patch-based Multi-Stage Fusion with Convolutional Neural Networks for TinyML
- URL: http://arxiv.org/abs/2505.11483v1
- Date: Fri, 16 May 2025 17:47:15 GMT
- Title: msf-CNN: Patch-based Multi-Stage Fusion with Convolutional Neural Networks for TinyML
- Authors: Zhaolan Huang, Emmanuel Baccelli,
- Abstract summary: We introduce msf-CNN, a novel technique that efficiently finds optimal fusion settings for convolutional neural networks (CNNs)<n>We show that msf-CNN can achieve inference using 50% less RAM compared to the prior art.
- Score: 0.4297070083645049
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: AI spans from large language models to tiny models running on microcontrollers (MCUs). Extremely memory-efficient model architectures are decisive to fit within an MCU's tiny memory budget e.g., 128kB of RAM. However, inference latency must remain small to fit real-time constraints. An approach to tackle this is patch-based fusion, which aims to optimize data flows across neural network layers. In this paper, we introduce msf-CNN, a novel technique that efficiently finds optimal fusion settings for convolutional neural networks (CNNs) by walking through the fusion solution space represented as a directed acyclic graph. Compared to previous work on CNN fusion for MCUs, msf-CNN identifies a wider set of solutions. We published an implementation of msf-CNN running on various microcontrollers (ARM Cortex-M, RISC-V, ESP32). We show that msf-CNN can achieve inference using 50% less RAM compared to the prior art (MCUNetV2 and StreamNet). We thus demonstrate how msf-CNN offers additional flexibility for system designers.
Related papers
- Scalable Mechanistic Neural Networks for Differential Equations and Machine Learning [52.28945097811129]
We propose an enhanced neural network framework designed for scientific machine learning applications involving long temporal sequences.<n>We reduce the computational time and space complexities from cubic and quadratic with respect to the sequence length, respectively, to linear.<n>Extensive experiments demonstrate that S-MNN matches the original MNN in precision while substantially reducing computational resources.
arXiv Detail & Related papers (2024-10-08T14:27:28Z) - MCU-MixQ: A HW/SW Co-optimized Mixed-precision Neural Network Design Framework for MCUs [9.719789698194154]
Mixed-precision neural network (MPNN) that utilizes just enough data width for the neural network processing is an effective approach to meet the stringent resources constraints.
However, there is still a lack of sub-byte and mixed-precision SIMD operations in MCU-class ISA.
In this work, we propose to pack multiple low-bitwidth arithmetic operations within a single instruction multiple data (SIMD) instructions in typical MCUs.
arXiv Detail & Related papers (2024-07-17T14:51:15Z) - Memory-Efficient Reversible Spiking Neural Networks [8.05761813203348]
Spiking neural networks (SNNs) are potential competitors to artificial neural networks (ANNs)
SNNs require much more memory than ANNs, which impedes the training of deeper SNN models.
We propose the reversible spiking neural network to reduce the memory cost of intermediate activations and membrane potentials during training.
arXiv Detail & Related papers (2023-12-13T06:39:49Z) - Resource Constrained Model Compression via Minimax Optimization for
Spiking Neural Networks [11.19282454437627]
Spiking Neural Networks (SNNs) have the characteristics of event-driven and high energy-efficient networks.
It is difficult to deploy these networks on resource-limited edge devices directly.
We propose an improved end-to-end Minimax optimization method for this sparse learning problem.
arXiv Detail & Related papers (2023-08-09T02:50:15Z) - Combining Multi-Objective Bayesian Optimization with Reinforcement Learning for TinyML [4.2019872499238256]
We propose a novel strategy for deploying deep neural networks on microcontrollers (TinyML) based on multi-objective Bayesian optimization (MOBOpt)<n>Our methodology aims at efficiently finding tradeoffs between a DNN's predictive accuracy, memory requirements on a given target system, and computational complexity.
arXiv Detail & Related papers (2023-05-23T14:31:52Z) - SpikeSim: An end-to-end Compute-in-Memory Hardware Evaluation Tool for
Benchmarking Spiking Neural Networks [4.0300632886917]
SpikeSim is a tool that can perform realistic performance, energy, latency and area evaluation of IMC-mapped SNNs.
We propose SNN topological modifications leading to 1.24x and 10x reduction in the neuronal module's area and the overall energy-delay-product value.
arXiv Detail & Related papers (2022-10-24T01:07:17Z) - SlimFL: Federated Learning with Superposition Coding over Slimmable
Neural Networks [56.68149211499535]
Federated learning (FL) is a key enabler for efficient communication and computing leveraging devices' distributed computing capabilities.
This paper proposes a novel learning framework by integrating FL and width-adjustable slimmable neural networks (SNNs)
We propose a communication and energy-efficient SNN-based FL (named SlimFL) that jointly utilizes superposition coding (SC) for global model aggregation and superposition training (ST) for updating local models.
arXiv Detail & Related papers (2022-03-26T15:06:13Z) - Joint Superposition Coding and Training for Federated Learning over
Multi-Width Neural Networks [52.93232352968347]
This paper aims to integrate two synergetic technologies, federated learning (FL) and width-adjustable slimmable neural network (SNN)
FL preserves data privacy by exchanging the locally trained models of mobile devices. SNNs are however non-trivial, particularly under wireless connections with time-varying channel conditions.
We propose a communication and energy-efficient SNN-based FL (named SlimFL) that jointly utilizes superposition coding (SC) for global model aggregation and superposition training (ST) for updating local models.
arXiv Detail & Related papers (2021-12-05T11:17:17Z) - MCUNetV2: Memory-Efficient Patch-based Inference for Tiny Deep Learning [72.80896338009579]
We find that the memory bottleneck is due to the imbalanced memory distribution in convolutional neural network (CNN) designs.
We propose a generic patch-by-patch inference scheduling, which significantly cuts down the peak memory.
We automate the process with neural architecture search to jointly optimize the neural architecture and inference scheduling, leading to MCUNetV2.
arXiv Detail & Related papers (2021-10-28T17:58:45Z) - Sub-bit Neural Networks: Learning to Compress and Accelerate Binary
Neural Networks [72.81092567651395]
Sub-bit Neural Networks (SNNs) are a new type of binary quantization design tailored to compress and accelerate BNNs.
SNNs are trained with a kernel-aware optimization framework, which exploits binary quantization in the fine-grained convolutional kernel space.
Experiments on visual recognition benchmarks and the hardware deployment on FPGA validate the great potentials of SNNs.
arXiv Detail & Related papers (2021-10-18T11:30:29Z) - Model Fusion via Optimal Transport [64.13185244219353]
We present a layer-wise model fusion algorithm for neural networks.
We show that this can successfully yield "one-shot" knowledge transfer between neural networks trained on heterogeneous non-i.i.d. data.
arXiv Detail & Related papers (2019-10-12T22:07:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.