Related papers: Quantization of Deep Neural Networks to facilitate self-correction of weights on Phase Change Memory-based analog hardware

Quantization of Deep Neural Networks to facilitate self-correction of weights on Phase Change Memory-based analog hardware

URL: http://arxiv.org/abs/2310.00337v1
Date: Sat, 30 Sep 2023 10:47:25 GMT
Title: Quantization of Deep Neural Networks to facilitate self-correction of weights on Phase Change Memory-based analog hardware
Authors: Arseni Ivanov
Abstract summary: We develop an algorithm to approximate a set of multiplicative weights. These weights aim to represent the original network's weights with minimal loss in performance. Our results demonstrate that, when paired with an on-chip pulse generator, our self-correcting neural network performs comparably to those trained with analog-aware algorithms.
Score: 0.0
License: http://creativecommons.org/licenses/by/4.0/
Abstract: In recent years, hardware-accelerated neural networks have gained significant attention for edge computing applications. Among various hardware options, crossbar arrays, offer a promising avenue for efficient storage and manipulation of neural network weights. However, the transition from trained floating-point models to hardware-constrained analog architectures remains a challenge. In this work, we combine a quantization technique specifically designed for such architectures with a novel self-correcting mechanism. By utilizing dual crossbar connections to represent both the positive and negative parts of a single weight, we develop an algorithm to approximate a set of multiplicative weights. These weights, along with their differences, aim to represent the original network's weights with minimal loss in performance. We implement the models using IBM's aihwkit and evaluate their efficacy over time. Our results demonstrate that, when paired with an on-chip pulse generator, our self-correcting neural network performs comparably to those trained with analog-aware algorithms.

Related papers

Bruno: Backpropagation Running Undersampled for Novel device Optimization [37.69303106863453]
We present a bottom-up approach to train neural networks for hardware based on spiking neurons and synapses built on ferroelectric non-volatile devices (RRAM)<n>The training algorithm is then tested on a dataset with a network composed of quantized synapses based on RRAM and ferroelectric integrate-and-fire neurons.
arXiv Detail & Related papers (2025-05-23T12:06:43Z)
Task-Oriented Real-time Visual Inference for IoVT Systems: A Co-design Framework of Neural Networks and Edge Deployment [61.20689382879937]
Task-oriented edge computing addresses this by shifting data analysis to the edge. Existing methods struggle to balance high model performance with low resource consumption. We propose a novel co-design framework to optimize neural network architecture.
arXiv Detail & Related papers (2024-10-29T19:02:54Z)
Incrementally-Computable Neural Networks: Efficient Inference for Dynamic Inputs [75.40636935415601]
Deep learning often faces the challenge of efficiently processing dynamic inputs, such as sensor data or user inputs. We take an incremental computing approach, looking to reuse calculations as the inputs change. We apply this approach to the transformers architecture, creating an efficient incremental inference algorithm with complexity proportional to the fraction of modified inputs.
arXiv Detail & Related papers (2023-07-27T16:30:27Z)
Synaptic metaplasticity with multi-level memristive devices [1.5598974049838272]
We propose a memristor-based hardware solution for implementing metaplasticity during both inference and training. We show that a two-layer perceptron achieves 97% and 86% accuracy on consecutive training of MNIST and Fashion-MNIST. Our architecture is compatible with the memristor limited endurance and has a 15x reduction in memory.
arXiv Detail & Related papers (2023-06-21T09:40:25Z)
Biologically Plausible Learning on Neuromorphic Hardware Architectures [27.138481022472]
Neuromorphic computing is an emerging paradigm that confronts this imbalance by computations directly in analog memories. This work is the first to compare the impact of different learning algorithms on Compute-In-Memory-based hardware and vice versa.
arXiv Detail & Related papers (2022-12-29T15:10:59Z)
Vertical Layering of Quantized Neural Networks for Heterogeneous Inference [57.42762335081385]
We study a new vertical-layered representation of neural network weights for encapsulating all quantized models into a single one. We can theoretically achieve any precision network for on-demand service while only needing to train and maintain one model.
arXiv Detail & Related papers (2022-12-10T15:57:38Z)
Energy Efficient Hardware Acceleration of Neural Networks with Power-of-Two Quantisation [0.0]
We show that a hardware neural network accelerator with PoT weights implemented on the Zynq UltraScale + MPSoC ZCU104 FPGA can be at least $1.4x$ more energy efficient than the uniform quantisation version.
arXiv Detail & Related papers (2022-09-30T06:33:40Z)
Low-bit Shift Network for End-to-End Spoken Language Understanding [7.851607739211987]
We propose the use of power-of-two quantization, which quantizes continuous parameters into low-bit power-of-two values. This reduces computational complexity by removing expensive multiplication operations and with the use of low-bit weights.
arXiv Detail & Related papers (2022-07-15T14:34:22Z)
BiTAT: Neural Network Binarization with Task-dependent Aggregated Transformation [116.26521375592759]
Quantization aims to transform high-precision weights and activations of a given neural network into low-precision weights/activations for reduced memory usage and computation. Extreme quantization (1-bit weight/1-bit activations) of compactly-designed backbone architectures results in severe performance degeneration. This paper proposes a novel Quantization-Aware Training (QAT) method that can effectively alleviate performance degeneration.
arXiv Detail & Related papers (2022-07-04T13:25:49Z)
ZippyPoint: Fast Interest Point Detection, Description, and Matching through Mixed Precision Discretization [71.91942002659795]
We investigate and adapt network quantization techniques to accelerate inference and enable its use on compute limited platforms. ZippyPoint, our efficient quantized network with binary descriptors, improves the network runtime speed, the descriptor matching speed, and the 3D model size. These improvements come at a minor performance degradation as evaluated on the tasks of homography estimation, visual localization, and map-free visual relocalization.
arXiv Detail & Related papers (2022-03-07T18:59:03Z)
Post-training Quantization for Neural Networks with Provable Guarantees [9.58246628652846]
We modify a post-training neural-network quantization method, GPFQ, that is based on a greedy path-following mechanism. We prove that for quantizing a single-layer network, the relative square error essentially decays linearly in the number of weights.
arXiv Detail & Related papers (2022-01-26T18:47:38Z)
Ps and Qs: Quantization-aware pruning for efficient low latency neural network inference [56.24109486973292]
We study the interplay between pruning and quantization during the training of neural networks for ultra low latency applications. We find that quantization-aware pruning yields more computationally efficient models than either pruning or quantization alone for our task.
arXiv Detail & Related papers (2021-02-22T19:00:05Z)

This list is automatically generated from the titles and abstracts of the papers in this site.