Multiplierless Design of High-Speed Very Large Constant Multiplications
- URL: http://arxiv.org/abs/2309.05550v2
- Date: Tue, 12 Sep 2023 06:30:00 GMT
- Title: Multiplierless Design of High-Speed Very Large Constant Multiplications
- Authors: Levent Aksoy, Debapriya Basu Roy, Malik Imran, Samuel Pagliarini,
- Abstract summary: In cryptographic algorithms, the constants to be multiplied by a variable can be very large due to security requirements.
We introduce an electronic design automation tool, called LEIGER, which can automatically generate the realizations of very large constant multiplications.
- Score: 3.5382618288815495
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In cryptographic algorithms, the constants to be multiplied by a variable can be very large due to security requirements. Thus, the hardware complexity of such algorithms heavily depends on the design architecture handling large constants. In this paper, we introduce an electronic design automation tool, called LEIGER, which can automatically generate the realizations of very large constant multiplications for low-complexity and high-speed applications, targeting the ASIC design platform. LEIGER can utilize the shift-adds architecture and use 3-input operations, i.e., carry-save adders (CSAs), where the number of CSAs is reduced using a prominent optimization algorithm. It can also generate constant multiplications under a hybrid design architecture, where 2-and 3-input operations are used at different stages. Moreover, it can describe constant multiplications under a design architecture using compressor trees. As a case study, high-speed Montgomery multiplication, which is a fundamental operation in cryptographic algorithms, is designed with its constant multiplication block realized under the proposed architectures. Experimental results indicate that LEIGER enables a designer to explore the trade-off between area and delay of the very large constant and Montgomery multiplications and leads to designs with area-delay product, latency, and energy consumption values significantly better than those obtained by a recently proposed algorithm.
Related papers
- Strassen Multisystolic Array Hardware Architectures [0.0]
Strassen's matrix multiplication algorithm reduces the complexity of naive matrix multiplication.
General-purpose hardware is not suitable for achieving the algorithm's promised theoretical speedups.
We present and evaluate new systolic array architectures that efficiently translate the theoretical complexity reductions of Strassen's algorithm directly into hardware resource savings.
arXiv Detail & Related papers (2025-02-14T10:40:32Z) - Enhanced Computationally Efficient Long LoRA Inspired Perceiver Architectures for Auto-Regressive Language Modeling [2.9228447484533695]
The Transformer architecture has revolutionized the Natural Language Processing field and is the backbone of Large Language Models (LLMs)
One of the challenges in the Transformer architecture is the quadratic complexity of the attention mechanism that prohibits the efficient processing of long sequence lengths.
One of the important works in this respect is the Perceiver class of architectures that have demonstrated excellent performance while reducing the computation complexity.
arXiv Detail & Related papers (2024-12-08T23:41:38Z) - AsCAN: Asymmetric Convolution-Attention Networks for Efficient Recognition and Generation [48.82264764771652]
We introduce AsCAN -- a hybrid architecture, combining both convolutional and transformer blocks.
AsCAN supports a variety of tasks: recognition, segmentation, class-conditional image generation.
We then scale the same architecture to solve a large-scale text-to-image task and show state-of-the-art performance.
arXiv Detail & Related papers (2024-11-07T18:43:17Z) - Efficient and Flexible Differet-Radix Montgomery Modular Multiplication for Hardware Implementation [14.516310806294433]
We propose an efficient parallel variant of iterative Montgomery modular multiplication, called DRMMM, that allows the quotient can be computed in multiple iterations.
Based on proposed variant, we also design high-performance hardware implementation architecture for faster operation.
arXiv Detail & Related papers (2024-07-17T16:24:15Z) - RL-MUL 2.0: Multiplier Design Optimization with Parallel Deep Reinforcement Learning and Space Reduction [8.093985979285533]
We propose a multiplier design optimization framework based on reinforcement learning.
We utilize matrix and tensor representations for the compressor tree of a multiplier, enabling seamless integration of convolutional neural networks as the agent network.
Experiments conducted on different bit widths of multipliers demonstrate that multipliers produced by our approach outperform all baseline designs in terms of area, power, and delay.
arXiv Detail & Related papers (2024-03-31T10:43:33Z) - Efficient Controllable Multi-Task Architectures [85.76598445904374]
We propose a multi-task model consisting of a shared encoder and task-specific decoders where both encoder and decoder channel widths are slimmable.
Our key idea is to control the task importance by varying the capacities of task-specific decoders, while controlling the total computational cost.
This improves overall accuracy by allowing a stronger encoder for a given budget, increases control over computational cost, and delivers high-quality slimmed sub-architectures.
arXiv Detail & Related papers (2023-08-22T19:09:56Z) - TurboViT: Generating Fast Vision Transformers via Generative
Architecture Search [74.24393546346974]
Vision transformers have shown unprecedented levels of performance in tackling various visual perception tasks in recent years.
There has been significant research recently on the design of efficient vision transformer architecture.
In this study, we explore the generation of fast vision transformer architecture designs via generative architecture search.
arXiv Detail & Related papers (2023-08-22T13:08:29Z) - ReLU and Addition-based Gated RNN [1.484528358552186]
We replace the multiplication and sigmoid function of the conventional recurrent gate with addition and ReLU activation.
This mechanism is designed to maintain long-term memory for sequence processing but at a reduced computational cost.
arXiv Detail & Related papers (2023-08-10T15:18:16Z) - FormerTime: Hierarchical Multi-Scale Representations for Multivariate
Time Series Classification [53.55504611255664]
FormerTime is a hierarchical representation model for improving the classification capacity for the multivariate time series classification task.
It exhibits three aspects of merits: (1) learning hierarchical multi-scale representations from time series data, (2) inheriting the strength of both transformers and convolutional networks, and (3) tacking the efficiency challenges incurred by the self-attention mechanism.
arXiv Detail & Related papers (2023-02-20T07:46:14Z) - Towards Accurate and Compact Architectures via Neural Architecture
Transformer [95.4514639013144]
It is necessary to optimize the operations inside an architecture to improve the performance without introducing extra computational cost.
We have proposed a Neural Architecture Transformer (NAT) method which casts the optimization problem into a Markov Decision Process (MDP)
We propose a Neural Architecture Transformer++ (NAT++) method which further enlarges the set of candidate transitions to improve the performance of architecture optimization.
arXiv Detail & Related papers (2021-02-20T09:38:10Z) - PolyDL: Polyhedral Optimizations for Creation of High Performance DL
primitives [55.79741270235602]
We present compiler algorithms to automatically generate high performance implementations of Deep Learning primitives.
We develop novel data reuse analysis algorithms using the polyhedral model.
We also show that such a hybrid compiler plus a minimal library-use approach results in state-of-the-art performance.
arXiv Detail & Related papers (2020-06-02T06:44:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.