Multiplier with Reduced Activities and Minimized Interconnect for Inner
Product Arrays
- URL: http://arxiv.org/abs/2204.09515v1
- Date: Mon, 11 Apr 2022 05:45:43 GMT
- Title: Multiplier with Reduced Activities and Minimized Interconnect for Inner
Product Arrays
- Authors: Muhammad Usman, Jeong-A Lee and Milos D. Ercegovac
- Abstract summary: We present a pipelined multiplier with reduced activities and minimized interconnect based on online digit-serial arithmetic.
For $8, 16, 24$ and $32$ bit precision, the proposed low power pipelined design show upto $38%$ and $44%$ reduction in power and area respectively.
- Score: 0.8078491757252693
- License: http://creativecommons.org/publicdomain/zero/1.0/
- Abstract: We present a pipelined multiplier with reduced activities and minimized
interconnect based on online digit-serial arithmetic. The working precision has
been truncated such that $p<n$ bits are used to compute $n$ bits product,
resulting in significant savings in area and power. The digit slices follow
variable precision according to input, increasing upto $p$ and then decreases
according to the error profile. Pipelining has been done to achieve high
throughput and low latency which is desirable for compute intensive inner
products. Synthesis results of the proposed designs have been presented and
compared with the non-pipelined online multiplier, pipelined online multiplier
with full working precision and conventional serial-parallel and array
multipliers. For $8, 16, 24$ and $32$ bit precision, the proposed low power
pipelined design show upto $38\%$ and $44\%$ reduction in power and area
respectively compared to the pipelined online multiplier without working
precision truncation.
Related papers
- Chain of Thought Empowers Transformers to Solve Inherently Serial Problems [57.58801785642868]
Chain of thought (CoT) is a highly effective method to improve the accuracy of large language models (LLMs) on arithmetics and symbolic reasoning tasks.
This work provides a theoretical understanding of the power of CoT for decoder-only transformers through the lens of expressiveness.
arXiv Detail & Related papers (2024-02-20T10:11:03Z) - TCNCA: Temporal Convolution Network with Chunked Attention for Scalable
Sequence Processing [52.64837396100988]
MEGA is a recent transformer-based architecture, which utilizes a linear recurrent operator whose parallel computation, based on the FFT, scales as $O(LlogL)$, with $L$ being the sequence length.
We build upon their approach by replacing the linear recurrence with a special temporal convolutional network which permits larger receptive field size with shallower networks, and reduces the computational complexity to $O(L)$.
We evaluate TCNCA on EnWik8 language modeling, long-range-arena (LRA) sequence classification, as well as a synthetic reasoning benchmark associative recall.
arXiv Detail & Related papers (2023-12-09T16:12:25Z) - Low-Overhead Parallelisation of LCU via Commuting Operators [0.0]
Linear Combination of Unitaries (LCU) is a powerful scheme for the block encoding of operators but suffers from high overheads.
We discuss the parallelisation of LCU and in particular the SELECT subroutine of LCU based on partitioning of observables into groups of commuting operators.
We additionally discuss the parallelisation of QROM circuits which are a special case of our main results.
arXiv Detail & Related papers (2023-12-01T16:29:02Z) - Instant Complexity Reduction in CNNs using Locality-Sensitive Hashing [50.79602839359522]
We propose HASTE (Hashing for Tractable Efficiency), a parameter-free and data-free module that acts as a plug-and-play replacement for any regular convolution module.
We are able to drastically compress latent feature maps without sacrificing much accuracy by using locality-sensitive hashing (LSH)
In particular, we are able to instantly drop 46.72% of FLOPs while only losing 1.25% accuracy by just swapping the convolution modules in a ResNet34 on CIFAR-10 for our HASTE module.
arXiv Detail & Related papers (2023-09-29T13:09:40Z) - DeepPCR: Parallelizing Sequential Operations in Neural Networks [4.241834259165193]
We introduce DeepPCR, a novel algorithm which parallelizes typically sequential operations in order to speed up inference and training of neural networks.
DeepPCR is based on interpreting a sequence of $L$ steps as the solution of a specific system of equations, which we recover using the Parallel Cyclic Reduction algorithm.
To verify the theoretical lower complexity of the algorithm, and to identify regimes for speedup, we test the effectiveness of DeepPCR in parallelizing the forward and backward pass in multi-layer perceptrons.
arXiv Detail & Related papers (2023-09-28T10:15:30Z) - Tangent Transformers for Composition, Privacy and Removal [58.280295030852194]
Tangent Attention Fine-Tuning (TAFT) is a method for fine-tuning linearized transformers.
Tangent Attention Fine-Tuning (TAFT) is a method for fine-tuning linearized transformers.
arXiv Detail & Related papers (2023-07-16T18:31:25Z) - Low-Latency Online Multiplier with Reduced Activities and Minimized
Interconnect for Inner Product Arrays [0.8078491757252693]
This paper proposes a low latency multiplier based on online or left-to-right arithmetic.
Online arithmetic enables overlapping successive operations regardless of data dependency.
Serial nature of the online algorithm and gradual increment/decrement of active slices minimize the interconnects and signal activities.
arXiv Detail & Related papers (2023-04-06T01:22:27Z) - Fast quantum circuit cutting with randomized measurements [0.0]
We propose a new method to extend the size of a quantum computation beyond the number of physical qubits available on a single device.
This is accomplished by randomly inserting measure-and-prepare channels to express the output state of a large circuit as a separable state across distinct devices.
arXiv Detail & Related papers (2022-07-29T15:13:04Z) - Tabula: Efficiently Computing Nonlinear Activation Functions for Secure Neural Network Inference [18.363580113885174]
Multiparty approaches to secure neural network inference commonly rely on garbled circuits.
We propose Tabula, an algorithm based on secure lookup tables.
Compared to garbled circuits with 8-bit quantized inputs, Tabula with 8-bit activations uses between $280$-$560 times$ less communication.
arXiv Detail & Related papers (2022-03-05T23:26:06Z) - HAWQV3: Dyadic Neural Network Quantization [73.11579145354801]
Current low-precision quantization algorithms often have the hidden cost of conversion back and forth from floating point to quantized integer values.
We present HAWQV3, a novel mixed-precision integer-only quantization framework.
arXiv Detail & Related papers (2020-11-20T23:51:43Z) - Bayesian Bits: Unifying Quantization and Pruning [73.27732135853243]
We introduce Bayesian Bits, a practical method for joint mixed precision quantization and pruning through gradient based optimization.
We experimentally validate our proposed method on several benchmark datasets and show that we can learn pruned, mixed precision networks.
arXiv Detail & Related papers (2020-05-14T16:00:34Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.