Related papers: On the Acceleration of Deep Neural Network Inference using Quantized Compressed Sensing

Related papers

Reducing Storage of Pretrained Neural Networks by Rate-Constrained Quantization and Entropy Coding [56.066799081747845]
The ever-growing size of neural networks poses serious challenges on resource-constrained devices.<n>We propose a novel post-training compression framework that combines rate-aware quantization with entropy coding.<n>Our method allows for very fast decoding and is compatible with arbitrary quantization grids.
arXiv Detail & Related papers (2025-05-24T15:52:49Z)
Pushing the Limits of Low-Bit Optimizers: A Focus on EMA Dynamics [65.37942405146232]
We present a novel type of overload that carries with extremely lightweight state elements, achieved through ultra-low-precision quantization. The proposed SOLO achieves substantial memory savings (approximately 45 GB when training a 7B model) with minimal accuracy loss.
arXiv Detail & Related papers (2025-05-01T06:47:45Z)
ZOBNN: Zero-Overhead Dependable Design of Binary Neural Networks with Deliberately Quantized Parameters [0.0]
In this paper, we introduce a third advantage of very low-precision neural networks: improved fault-tolerance. We investigate the impact of memory faults on state-of-the-art binary neural networks (BNNs) through comprehensive analysis. We propose a technique to improve BNN dependability by restricting the range of float parameters through a novel deliberately uniform quantization.
arXiv Detail & Related papers (2024-07-06T05:31:11Z)
Gradient-based Automatic Mixed Precision Quantization for Neural Networks On-Chip [0.9187138676564589]
We present High Granularity Quantization (HGQ), an innovative quantization-aware training method. HGQ fine-tune the per-weight and per-activation precision by making them optimizable through gradient descent. This approach enables ultra-low latency and low power neural networks on hardware capable of performing arithmetic operations.
arXiv Detail & Related papers (2024-05-01T17:18:46Z)
Binarizing Sparse Convolutional Networks for Efficient Point Cloud Analysis [93.55896765176414]
We propose binary sparse convolutional networks called BSC-Net for efficient point cloud analysis. We employ the differentiable search strategies to discover the optimal opsitions for active site matching in the shifted sparse convolution. Our BSC-Net achieves significant improvement upon our srtong baseline and outperforms the state-of-the-art network binarization methods.
arXiv Detail & Related papers (2023-03-27T13:47:06Z)
Deep Quantum Error Correction [73.54643419792453]
Quantum error correction codes (QECC) are a key component for realizing the potential of quantum computing. In this work, we efficiently train novel emphend-to-end deep quantum error decoders. The proposed method demonstrates the power of neural decoders for QECC by achieving state-of-the-art accuracy.
arXiv Detail & Related papers (2023-01-27T08:16:26Z)
Automatic Network Adaptation for Ultra-Low Uniform-Precision Quantization [6.1664476076961146]
Uniform-precision neural network quantization has gained popularity since it simplifies densely packed arithmetic unit for high computing capability. It ignores heterogeneous sensitivity to the impact of quantization errors across the layers, resulting in sub-optimal inference. This work proposes a novel neural architecture search called neural channel expansion that adjusts the network structure to alleviate accuracy degradation from ultra-low uniform-precision quantization.
arXiv Detail & Related papers (2022-12-21T09:41:25Z)
CoNLoCNN: Exploiting Correlation and Non-Uniform Quantization for Energy-Efficient Low-precision Deep Convolutional Neural Networks [13.520972975766313]
We propose a framework to enable energy-efficient low-precision deep convolutional neural network inference by exploiting non-uniform quantization of weights. We also propose a novel data representation format, Encoded Low-Precision Binary Signed Digit, to compress the bit-width of weights.
arXiv Detail & Related papers (2022-07-31T01:34:56Z)
Learning Representations for CSI Adaptive Quantization and Feedback [51.14360605938647]
We propose an efficient method for adaptive quantization and feedback in frequency division duplexing systems. Existing works mainly focus on the implementation of autoencoder (AE) neural networks for CSI compression. We recommend two different methods: one based on a post training quantization and the second one in which the codebook is found during the training of the AE.
arXiv Detail & Related papers (2022-07-13T08:52:13Z)
Post-training Quantization for Neural Networks with Provable Guarantees [9.58246628652846]
We modify a post-training neural-network quantization method, GPFQ, that is based on a greedy path-following mechanism. We prove that for quantizing a single-layer network, the relative square error essentially decays linearly in the number of weights.
arXiv Detail & Related papers (2022-01-26T18:47:38Z)
Cluster-Promoting Quantization with Bit-Drop for Minimizing Network Quantization Loss [61.26793005355441]
Cluster-Promoting Quantization (CPQ) finds the optimal quantization grids for neural networks. DropBits is a new bit-drop technique that revises the standard dropout regularization to randomly drop bits instead of neurons. We experimentally validate our method on various benchmark datasets and network architectures.
arXiv Detail & Related papers (2021-09-05T15:15:07Z)
Fully Quantized Image Super-Resolution Networks [81.75002888152159]
We propose a Fully Quantized image Super-Resolution framework (FQSR) to jointly optimize efficiency and accuracy. We apply our quantization scheme on multiple mainstream super-resolution architectures, including SRResNet, SRGAN and EDSR. Our FQSR using low bits quantization can achieve on par performance compared with the full-precision counterparts on five benchmark datasets.
arXiv Detail & Related papers (2020-11-29T03:53:49Z)
Widening and Squeezing: Towards Accurate and Efficient QNNs [125.172220129257]
Quantization neural networks (QNNs) are very attractive to the industry because their extremely cheap calculation and storage overhead, but their performance is still worse than that of networks with full-precision parameters. Most of existing methods aim to enhance performance of QNNs especially binary neural networks by exploiting more effective training techniques. We address this problem by projecting features in original full-precision networks to high-dimensional quantization features.
arXiv Detail & Related papers (2020-02-03T04:11:13Z)

This list is automatically generated from the titles and abstracts of the papers in this site.