Multiplierless Design of High-Speed Very Large Constant Multiplications
- URL: http://arxiv.org/abs/2309.05550v2
- Date: Tue, 12 Sep 2023 06:30:00 GMT
- Title: Multiplierless Design of High-Speed Very Large Constant Multiplications
- Authors: Levent Aksoy, Debapriya Basu Roy, Malik Imran, Samuel Pagliarini,
- Abstract summary: In cryptographic algorithms, the constants to be multiplied by a variable can be very large due to security requirements.
We introduce an electronic design automation tool, called LEIGER, which can automatically generate the realizations of very large constant multiplications.
- Score: 3.5382618288815495
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In cryptographic algorithms, the constants to be multiplied by a variable can be very large due to security requirements. Thus, the hardware complexity of such algorithms heavily depends on the design architecture handling large constants. In this paper, we introduce an electronic design automation tool, called LEIGER, which can automatically generate the realizations of very large constant multiplications for low-complexity and high-speed applications, targeting the ASIC design platform. LEIGER can utilize the shift-adds architecture and use 3-input operations, i.e., carry-save adders (CSAs), where the number of CSAs is reduced using a prominent optimization algorithm. It can also generate constant multiplications under a hybrid design architecture, where 2-and 3-input operations are used at different stages. Moreover, it can describe constant multiplications under a design architecture using compressor trees. As a case study, high-speed Montgomery multiplication, which is a fundamental operation in cryptographic algorithms, is designed with its constant multiplication block realized under the proposed architectures. Experimental results indicate that LEIGER enables a designer to explore the trade-off between area and delay of the very large constant and Montgomery multiplications and leads to designs with area-delay product, latency, and energy consumption values significantly better than those obtained by a recently proposed algorithm.
Related papers
- AsCAN: Asymmetric Convolution-Attention Networks for Efficient Recognition and Generation [48.82264764771652]
We introduce AsCAN -- a hybrid architecture, combining both convolutional and transformer blocks.
AsCAN supports a variety of tasks: recognition, segmentation, class-conditional image generation.
We then scale the same architecture to solve a large-scale text-to-image task and show state-of-the-art performance.
arXiv Detail & Related papers (2024-11-07T18:43:17Z) - Efficient and Flexible Differet-Radix Montgomery Modular Multiplication for Hardware Implementation [14.516310806294433]
We propose an efficient parallel variant of iterative Montgomery modular multiplication, called DRMMM, that allows the quotient can be computed in multiple iterations.
Based on proposed variant, we also design high-performance hardware implementation architecture for faster operation.
arXiv Detail & Related papers (2024-07-17T16:24:15Z) - All-to-all reconfigurability with sparse and higher-order Ising machines [0.0]
We introduce a multiplexed architecture that emulates all-to-all network functionality.
We show that running the adaptive parallel tempering algorithm demonstrates competitive algorithmic and prefactor advantages.
scaled magnetic versions of p-bit IMs could lead to orders of magnitude improvements over the state of the art for generic optimization.
arXiv Detail & Related papers (2023-11-21T20:27:02Z) - Efficient Controllable Multi-Task Architectures [85.76598445904374]
We propose a multi-task model consisting of a shared encoder and task-specific decoders where both encoder and decoder channel widths are slimmable.
Our key idea is to control the task importance by varying the capacities of task-specific decoders, while controlling the total computational cost.
This improves overall accuracy by allowing a stronger encoder for a given budget, increases control over computational cost, and delivers high-quality slimmed sub-architectures.
arXiv Detail & Related papers (2023-08-22T19:09:56Z) - TurboViT: Generating Fast Vision Transformers via Generative
Architecture Search [74.24393546346974]
Vision transformers have shown unprecedented levels of performance in tackling various visual perception tasks in recent years.
There has been significant research recently on the design of efficient vision transformer architecture.
In this study, we explore the generation of fast vision transformer architecture designs via generative architecture search.
arXiv Detail & Related papers (2023-08-22T13:08:29Z) - ReLU and Addition-based Gated RNN [1.484528358552186]
We replace the multiplication and sigmoid function of the conventional recurrent gate with addition and ReLU activation.
This mechanism is designed to maintain long-term memory for sequence processing but at a reduced computational cost.
arXiv Detail & Related papers (2023-08-10T15:18:16Z) - FormerTime: Hierarchical Multi-Scale Representations for Multivariate
Time Series Classification [53.55504611255664]
FormerTime is a hierarchical representation model for improving the classification capacity for the multivariate time series classification task.
It exhibits three aspects of merits: (1) learning hierarchical multi-scale representations from time series data, (2) inheriting the strength of both transformers and convolutional networks, and (3) tacking the efficiency challenges incurred by the self-attention mechanism.
arXiv Detail & Related papers (2023-02-20T07:46:14Z) - Towards Accurate and Compact Architectures via Neural Architecture
Transformer [95.4514639013144]
It is necessary to optimize the operations inside an architecture to improve the performance without introducing extra computational cost.
We have proposed a Neural Architecture Transformer (NAT) method which casts the optimization problem into a Markov Decision Process (MDP)
We propose a Neural Architecture Transformer++ (NAT++) method which further enlarges the set of candidate transitions to improve the performance of architecture optimization.
arXiv Detail & Related papers (2021-02-20T09:38:10Z) - PolyDL: Polyhedral Optimizations for Creation of High Performance DL
primitives [55.79741270235602]
We present compiler algorithms to automatically generate high performance implementations of Deep Learning primitives.
We develop novel data reuse analysis algorithms using the polyhedral model.
We also show that such a hybrid compiler plus a minimal library-use approach results in state-of-the-art performance.
arXiv Detail & Related papers (2020-06-02T06:44:09Z) - Near-Optimal Hardware Design for Convolutional Neural Networks [0.0]
This study proposes a novel, special-purpose, and high-efficiency hardware architecture for convolutional neural networks.
The proposed architecture maximizes the utilization of multipliers by designing the computational circuit with the same structure as that of the computational flow of the model.
An implementation based on the proposed hardware architecture has been applied in commercial AI products.
arXiv Detail & Related papers (2020-02-06T09:15:03Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.