Towards Efficient In-memory Computing Hardware for Quantized Neural
Networks: State-of-the-art, Open Challenges and Perspectives
- URL: http://arxiv.org/abs/2307.03936v1
- Date: Sat, 8 Jul 2023 09:10:35 GMT
- Title: Towards Efficient In-memory Computing Hardware for Quantized Neural
Networks: State-of-the-art, Open Challenges and Perspectives
- Authors: Olga Krestinskaya, Li Zhang, Khaled Nabil Salama
- Abstract summary: Limited energy and computational resources on edge push the transition from von Neumann architectures to In-memory Computing (IMC)
Quantization is one of the most efficient network compression techniques allowing to reduce the memory footprint, latency, and energy consumption.
This paper provides a comprehensive review of IMC-based Quantized Neural Networks (QNN) and links software-based quantization approaches to IMC hardware implementation.
- Score: 6.4480695157206895
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The amount of data processed in the cloud, the development of
Internet-of-Things (IoT) applications, and growing data privacy concerns force
the transition from cloud-based to edge-based processing. Limited energy and
computational resources on edge push the transition from traditional von
Neumann architectures to In-memory Computing (IMC), especially for machine
learning and neural network applications. Network compression techniques are
applied to implement a neural network on limited hardware resources.
Quantization is one of the most efficient network compression techniques
allowing to reduce the memory footprint, latency, and energy consumption. This
paper provides a comprehensive review of IMC-based Quantized Neural Networks
(QNN) and links software-based quantization approaches to IMC hardware
implementation. Moreover, open challenges, QNN design requirements,
recommendations, and perspectives along with an IMC-based QNN hardware roadmap
are provided.
Related papers
- Hardware-aware Neural Architecture Search of Early Exiting Networks on Edge Accelerators [12.394874144369396]
Growing demand for embedded intelligence at the edge imposes stringent computational and energy constraints.<n>Early Exiting Neural Networks (EENN) have emerged as a promising solution.<n>We propose a hardware-aware Neural Architecture Search (NAS) framework to optimize the placement of early exit points within a network backbone.
arXiv Detail & Related papers (2025-12-04T11:54:09Z) - Quantized Neural Networks for Microcontrollers: A Comprehensive Review of Methods, Platforms, and Applications [0.5599792629509229]
Quantized Neural Networks (QNNs) on resource-constrained devices, such as microcontrollers, have introduced challenges in balancing model performance, computational complexity, and memory constraints.<n>Tiny Machine Learning (TinyML) addresses these issues by integrating advancements across machine learning algorithms, hardware acceleration, and software optimization to efficiently run deep neural networks on embedded systems.
arXiv Detail & Related papers (2025-08-20T18:56:26Z) - Edge Intelligence with Spiking Neural Networks [50.33340747216377]
Spiking Neural Networks (SNNs) offer low-power, event-driven computation on resource-constrained devices.<n>We present a systematic taxonomy of EdgeSNN foundations, encompassing neuron models, learning algorithms, and supporting hardware platforms.<n>Three representative practical considerations of EdgeSNN are discussed in depth: on-device inference using lightweight SNN models, resource-aware training and updating under non-stationary data conditions, and secure and privacy-preserving issues.
arXiv Detail & Related papers (2025-07-18T16:47:52Z) - The Reliability Issue in ReRam-based CIM Architecture for SNN: A Survey [11.935228413907875]
Spiking Neural Networks (SNNs) offer a promising alternative by mimicking biological neural networks, enabling energy-efficient computation.
ReRAM and Compute-in-Memory (CIM) architectures aim to overcome the Von Neumann bottleneck by integrating storage and computation.
This survey explores the intersection of SNNs and ReRAM-based CIM architectures, focusing on the reliability challenges that arise from device-level variations and operational errors.
arXiv Detail & Related papers (2024-11-30T16:03:24Z) - Constraint Guided Model Quantization of Neural Networks [0.0]
Constraint Guided Model Quantization (CGMQ) is a quantization aware training algorithm that uses an upper bound on the computational resources and reduces the bit-widths of the parameters of the neural network.
It is shown on MNIST that the performance of CGMQ is competitive with state-of-the-art quantization aware training algorithms.
arXiv Detail & Related papers (2024-09-30T09:41:16Z) - From Graphs to Qubits: A Critical Review of Quantum Graph Neural Networks [56.51893966016221]
Quantum Graph Neural Networks (QGNNs) represent a novel fusion of quantum computing and Graph Neural Networks (GNNs)
This paper critically reviews the state-of-the-art in QGNNs, exploring various architectures.
We discuss their applications across diverse fields such as high-energy physics, molecular chemistry, finance and earth sciences, highlighting the potential for quantum advantage.
arXiv Detail & Related papers (2024-08-12T22:53:14Z) - Efficient and accurate neural field reconstruction using resistive memory [52.68088466453264]
Traditional signal reconstruction methods on digital computers face both software and hardware challenges.
We propose a systematic approach with software-hardware co-optimizations for signal reconstruction from sparse inputs.
This work advances the AI-driven signal restoration technology and paves the way for future efficient and robust medical AI and 3D vision applications.
arXiv Detail & Related papers (2024-04-15T09:33:09Z) - FPGA-optimized Hardware acceleration for Spiking Neural Networks [69.49429223251178]
This work presents the development of a hardware accelerator for an SNN, with off-line training, applied to an image recognition task.
The design targets a Xilinx Artix-7 FPGA, using in total around the 40% of the available hardware resources.
It reduces the classification time by three orders of magnitude, with a small 4.5% impact on the accuracy, if compared to its software, full precision counterpart.
arXiv Detail & Related papers (2022-01-18T13:59:22Z) - Resistive Neural Hardware Accelerators [0.46198289193451136]
ReRAM-based in-memory computing has great potential in the implementation of area and power efficient inference.
The shift towards ReRAM-based in-memory computing has great potential in the implementation of area and power efficient inference.
In this survey, we review the state-of-the-art ReRAM-based Deep Neural Networks (DNNs) many-core accelerators.
arXiv Detail & Related papers (2021-09-08T21:11:48Z) - A Quantum Convolutional Neural Network for Image Classification [7.745213180689952]
We propose a novel neural network model named Quantum Convolutional Neural Network (QCNN)
QCNN is based on implementable quantum circuits and has a similar structure as classical convolutional neural networks.
Numerical simulation results on the MNIST dataset demonstrate the effectiveness of our model.
arXiv Detail & Related papers (2021-07-08T06:47:34Z) - Learning Frequency-aware Dynamic Network for Efficient Super-Resolution [56.98668484450857]
This paper explores a novel frequency-aware dynamic network for dividing the input into multiple parts according to its coefficients in the discrete cosine transform (DCT) domain.
In practice, the high-frequency part will be processed using expensive operations and the lower-frequency part is assigned with cheap operations to relieve the computation burden.
Experiments conducted on benchmark SISR models and datasets show that the frequency-aware dynamic network can be employed for various SISR neural architectures.
arXiv Detail & Related papers (2021-03-15T12:54:26Z) - Spiking Neural Networks Hardware Implementations and Challenges: a
Survey [53.429871539789445]
Spiking Neural Networks are cognitive algorithms mimicking neuron and synapse operational principles.
We present the state of the art of hardware implementations of spiking neural networks.
We discuss the strategies employed to leverage the characteristics of these event-driven algorithms at the hardware level.
arXiv Detail & Related papers (2020-05-04T13:24:00Z) - HCM: Hardware-Aware Complexity Metric for Neural Network Architectures [6.556553154231475]
This paper introduces a hardware-aware complexity metric that aims to assist the system designer of the neural network architectures.
We demonstrate how the proposed metric can help evaluate different design alternatives of neural network models on resource-restricted devices.
arXiv Detail & Related papers (2020-04-19T16:42:51Z) - Deep Learning for Ultra-Reliable and Low-Latency Communications in 6G
Networks [84.2155885234293]
We first summarize how to apply data-driven supervised deep learning and deep reinforcement learning in URLLC.
To address these open problems, we develop a multi-level architecture that enables device intelligence, edge intelligence, and cloud intelligence for URLLC.
arXiv Detail & Related papers (2020-02-22T14:38:11Z) - Multi-Objective Optimization for Size and Resilience of Spiking Neural
Networks [0.9449650062296823]
Neuromorphic computing architectures model Spiking Neural Networks (SNNs) in silicon.
We study Spiking Neural Networks in two neuromorphic architecture implementations with the goal of decreasing their size.
We propose a multiobjective fitness function to optimize the size and resiliency of the SNN.
arXiv Detail & Related papers (2020-02-04T16:58:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.