Related papers: Quantized Neural Networks for Microcontrollers: A Comprehensive Review of Methods, Platforms, and Applications

Quantized Neural Networks for Microcontrollers: A Comprehensive Review of Methods, Platforms, and Applications

URL: http://arxiv.org/abs/2508.15008v3
Date: Mon, 15 Sep 2025 10:32:11 GMT
Title: Quantized Neural Networks for Microcontrollers: A Comprehensive Review of Methods, Platforms, and Applications
Authors: Hamza A. Abushahla, Dara Varam, Ariel J. N. Panopio, Mohamed I. AlHajri,
Abstract summary: Quantized Neural Networks (QNNs) on resource-constrained devices, such as microcontrollers, have introduced challenges in balancing model performance, computational complexity, and memory constraints.<n>Tiny Machine Learning (TinyML) addresses these issues by integrating advancements across machine learning algorithms, hardware acceleration, and software optimization to efficiently run deep neural networks on embedded systems.
Score: 0.5599792629509229
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: The deployment of Quantized Neural Networks (QNNs) on resource-constrained devices, such as microcontrollers, has introduced significant challenges in balancing model performance, computational complexity, and memory constraints. Tiny Machine Learning (TinyML) addresses these issues by integrating advancements across machine learning algorithms, hardware acceleration, and software optimization to efficiently run deep neural networks on embedded systems. This survey presents a hardware-centric introduction to quantization, systematically reviewing essential quantization techniques employed to accelerate deep learning models for embedded applications. In particular, further emphasis is placed on the critical trade-offs between model performance and hardware capabilities. The survey further evaluates existing software frameworks and hardware platforms designed specifically for supporting QNN execution on microcontrollers. Moreover, we provide an analysis of the current challenges and an outline of promising future directions in the rapidly evolving domain of QNN deployment.

Related papers

Edge-Cloud Collaborative Computing on Distributed Intelligence and Model Optimization: A Survey [58.50944604905037]
Edge-cloud collaborative computing (ECCC) has emerged as a pivotal paradigm for addressing the computational demands of modern intelligent applications.<n>Recent advancements in AI, particularly deep learning and large language models (LLMs), have dramatically enhanced the capabilities of these distributed systems.<n>This survey provides a structured tutorial on fundamental architectures, enabling technologies, and emerging applications.
arXiv Detail & Related papers (2025-05-03T13:55:38Z)
Task-Oriented Real-time Visual Inference for IoVT Systems: A Co-design Framework of Neural Networks and Edge Deployment [61.20689382879937]
Task-oriented edge computing addresses this by shifting data analysis to the edge. Existing methods struggle to balance high model performance with low resource consumption. We propose a novel co-design framework to optimize neural network architecture.
arXiv Detail & Related papers (2024-10-29T19:02:54Z)
Quantization-aware Neural Architectural Search for Intrusion Detection [5.010685611319813]
We present a design methodology that automatically trains and evolves quantized neural network (NN) models that are a thousand times smaller than state-of-the-art NNs. The number of LUTs utilized by this network when deployed to an FPGA is between 2.3x and 8.5x smaller with performance comparable to prior work.
arXiv Detail & Related papers (2023-11-07T18:35:29Z)
Towards Efficient In-memory Computing Hardware for Quantized Neural Networks: State-of-the-art, Open Challenges and Perspectives [6.4480695157206895]
Limited energy and computational resources on edge push the transition from von Neumann architectures to In-memory Computing (IMC) Quantization is one of the most efficient network compression techniques allowing to reduce the memory footprint, latency, and energy consumption. This paper provides a comprehensive review of IMC-based Quantized Neural Networks (QNN) and links software-based quantization approaches to IMC hardware implementation.
arXiv Detail & Related papers (2023-07-08T09:10:35Z)
Fluid Batching: Exit-Aware Preemptive Serving of Early-Exit Neural Networks on Edge NPUs [74.83613252825754]
"smart ecosystems" are being formed where sensing happens concurrently rather than standalone. This is shifting the on-device inference paradigm towards deploying neural processing units (NPUs) at the edge. We propose a novel early-exit scheduling that allows preemption at run time to account for the dynamicity introduced by the arrival and exiting processes.
arXiv Detail & Related papers (2022-09-27T15:04:01Z)
MAPLE-X: Latency Prediction with Explicit Microprocessor Prior Knowledge [87.41163540910854]
Deep neural network (DNN) latency characterization is a time-consuming process. We propose MAPLE-X which extends MAPLE by incorporating explicit prior knowledge of hardware devices and DNN architecture latency.
arXiv Detail & Related papers (2022-05-25T11:08:20Z)
FPGA-optimized Hardware acceleration for Spiking Neural Networks [69.49429223251178]
This work presents the development of a hardware accelerator for an SNN, with off-line training, applied to an image recognition task. The design targets a Xilinx Artix-7 FPGA, using in total around the 40% of the available hardware resources. It reduces the classification time by three orders of magnitude, with a small 4.5% impact on the accuracy, if compared to its software, full precision counterpart.
arXiv Detail & Related papers (2022-01-18T13:59:22Z)
Resistive Neural Hardware Accelerators [0.46198289193451136]
ReRAM-based in-memory computing has great potential in the implementation of area and power efficient inference. The shift towards ReRAM-based in-memory computing has great potential in the implementation of area and power efficient inference. In this survey, we review the state-of-the-art ReRAM-based Deep Neural Networks (DNNs) many-core accelerators.
arXiv Detail & Related papers (2021-09-08T21:11:48Z)
Generative machine learning with tensor networks: benchmarks on near-term quantum computers [0.0]
We explore quantum-assisted machine learning (QAML) on NISQ devices through the perspective of tensor networks (TNs) In particular, we lay out a framework for designing and optimizing TN-based QAML models using classical techniques, and then compiling these models to be run on quantum hardware. We present an exactly solvable benchmark problem for assessing the performance of MPS QAML models, and also present an application for the canonical MNIST handwritten digit dataset.
arXiv Detail & Related papers (2020-10-07T20:33:34Z)
Spiking Neural Networks Hardware Implementations and Challenges: a Survey [53.429871539789445]
Spiking Neural Networks are cognitive algorithms mimicking neuron and synapse operational principles. We present the state of the art of hardware implementations of spiking neural networks. We discuss the strategies employed to leverage the characteristics of these event-driven algorithms at the hardware level.
arXiv Detail & Related papers (2020-05-04T13:24:00Z)
HCM: Hardware-Aware Complexity Metric for Neural Network Architectures [6.556553154231475]
This paper introduces a hardware-aware complexity metric that aims to assist the system designer of the neural network architectures. We demonstrate how the proposed metric can help evaluate different design alternatives of neural network models on resource-restricted devices.
arXiv Detail & Related papers (2020-04-19T16:42:51Z)

This list is automatically generated from the titles and abstracts of the papers in this site.