Related papers: Multiplierless Design of High-Speed Very Large Constant Multiplications

Multiplierless Design of High-Speed Very Large Constant Multiplications

URL: http://arxiv.org/abs/2309.05550v2
Date: Tue, 12 Sep 2023 06:30:00 GMT
Title: Multiplierless Design of High-Speed Very Large Constant Multiplications
Authors: Levent Aksoy, Debapriya Basu Roy, Malik Imran, Samuel Pagliarini,
Abstract summary: In cryptographic algorithms, the constants to be multiplied by a variable can be very large due to security requirements. We introduce an electronic design automation tool, called LEIGER, which can automatically generate the realizations of very large constant multiplications.
Score: 3.5382618288815495
License: http://creativecommons.org/licenses/by/4.0/
Abstract: In cryptographic algorithms, the constants to be multiplied by a variable can be very large due to security requirements. Thus, the hardware complexity of such algorithms heavily depends on the design architecture handling large constants. In this paper, we introduce an electronic design automation tool, called LEIGER, which can automatically generate the realizations of very large constant multiplications for low-complexity and high-speed applications, targeting the ASIC design platform. LEIGER can utilize the shift-adds architecture and use 3-input operations, i.e., carry-save adders (CSAs), where the number of CSAs is reduced using a prominent optimization algorithm. It can also generate constant multiplications under a hybrid design architecture, where 2-and 3-input operations are used at different stages. Moreover, it can describe constant multiplications under a design architecture using compressor trees. As a case study, high-speed Montgomery multiplication, which is a fundamental operation in cryptographic algorithms, is designed with its constant multiplication block realized under the proposed architectures. Experimental results indicate that LEIGER enables a designer to explore the trade-off between area and delay of the very large constant and Montgomery multiplications and leads to designs with area-delay product, latency, and energy consumption values significantly better than those obtained by a recently proposed algorithm.

Related papers

Transolver-3: Scaling Up Transformer Solvers to Industrial-Scale Geometries [51.028432812178266]
Transolver-3 is a new member of the Transolver family designed for high-fidelity physics simulations.<n>We show that Transolver-3 is capable of handling meshes with over 160 million cells, achieving impressive performance across three challenging simulation benchmarks.
arXiv Detail & Related papers (2026-02-04T16:52:44Z)
Sliding Window Attention Training for Efficient Large Language Models [55.56483740523027]
We introduce SWAT, which enables efficient long-context handling via Sliding Window Attention Training. This paper first attributes the inefficiency of Transformers to the attention sink phenomenon resulting from the high variance of softmax operation. Experiments demonstrate that SWAT achieves SOTA performance compared with state-of-the-art linear recurrent architectures on eight benchmarks.
arXiv Detail & Related papers (2025-02-26T05:31:44Z)
Strassen Multisystolic Array Hardware Architectures [0.0]
Strassen's matrix multiplication algorithm reduces the complexity of naive matrix multiplication. General-purpose hardware is not suitable for achieving the algorithm's promised theoretical speedups. We present and evaluate new systolic array architectures that efficiently translate the theoretical complexity reductions of Strassen's algorithm directly into hardware resource savings.
arXiv Detail & Related papers (2025-02-14T10:40:32Z)
Enhanced Computationally Efficient Long LoRA Inspired Perceiver Architectures for Auto-Regressive Language Modeling [2.9228447484533695]
The Transformer architecture has revolutionized the Natural Language Processing field and is the backbone of Large Language Models (LLMs) One of the challenges in the Transformer architecture is the quadratic complexity of the attention mechanism that prohibits the efficient processing of long sequence lengths. One of the important works in this respect is the Perceiver class of architectures that have demonstrated excellent performance while reducing the computation complexity.
arXiv Detail & Related papers (2024-12-08T23:41:38Z)
AsCAN: Asymmetric Convolution-Attention Networks for Efficient Recognition and Generation [48.82264764771652]
We introduce AsCAN -- a hybrid architecture, combining both convolutional and transformer blocks. AsCAN supports a variety of tasks: recognition, segmentation, class-conditional image generation. We then scale the same architecture to solve a large-scale text-to-image task and show state-of-the-art performance.
arXiv Detail & Related papers (2024-11-07T18:43:17Z)
Efficient and Flexible Differet-Radix Montgomery Modular Multiplication for Hardware Implementation [14.516310806294433]
We propose an efficient parallel variant of iterative Montgomery modular multiplication, called DRMMM, that allows the quotient can be computed in multiple iterations. Based on proposed variant, we also design high-performance hardware implementation architecture for faster operation.
arXiv Detail & Related papers (2024-07-17T16:24:15Z)
RL-MUL 2.0: Multiplier Design Optimization with Parallel Deep Reinforcement Learning and Space Reduction [8.093985979285533]
We propose a multiplier design optimization framework based on reinforcement learning. We utilize matrix and tensor representations for the compressor tree of a multiplier, enabling seamless integration of convolutional neural networks as the agent network. Experiments conducted on different bit widths of multipliers demonstrate that multipliers produced by our approach outperform all baseline designs in terms of area, power, and delay.
arXiv Detail & Related papers (2024-03-31T10:43:33Z)
All-to-all reconfigurability with sparse and higher-order Ising machines [0.0]
We introduce a multiplexed architecture that emulates all-to-all network functionality. We show that running the adaptive parallel tempering algorithm demonstrates competitive algorithmic and prefactor advantages. scaled magnetic versions of p-bit IMs could lead to orders of magnitude improvements over the state of the art for generic optimization.
arXiv Detail & Related papers (2023-11-21T20:27:02Z)
Efficient Controllable Multi-Task Architectures [85.76598445904374]
We propose a multi-task model consisting of a shared encoder and task-specific decoders where both encoder and decoder channel widths are slimmable. Our key idea is to control the task importance by varying the capacities of task-specific decoders, while controlling the total computational cost. This improves overall accuracy by allowing a stronger encoder for a given budget, increases control over computational cost, and delivers high-quality slimmed sub-architectures.
arXiv Detail & Related papers (2023-08-22T19:09:56Z)
TurboViT: Generating Fast Vision Transformers via Generative Architecture Search [74.24393546346974]
Vision transformers have shown unprecedented levels of performance in tackling various visual perception tasks in recent years. There has been significant research recently on the design of efficient vision transformer architecture. In this study, we explore the generation of fast vision transformer architecture designs via generative architecture search.
arXiv Detail & Related papers (2023-08-22T13:08:29Z)
ReLU and Addition-based Gated RNN [1.484528358552186]
We replace the multiplication and sigmoid function of the conventional recurrent gate with addition and ReLU activation. This mechanism is designed to maintain long-term memory for sequence processing but at a reduced computational cost.
arXiv Detail & Related papers (2023-08-10T15:18:16Z)
FormerTime: Hierarchical Multi-Scale Representations for Multivariate Time Series Classification [53.55504611255664]
FormerTime is a hierarchical representation model for improving the classification capacity for the multivariate time series classification task. It exhibits three aspects of merits: (1) learning hierarchical multi-scale representations from time series data, (2) inheriting the strength of both transformers and convolutional networks, and (3) tacking the efficiency challenges incurred by the self-attention mechanism.
arXiv Detail & Related papers (2023-02-20T07:46:14Z)
Towards Accurate and Compact Architectures via Neural Architecture Transformer [95.4514639013144]
It is necessary to optimize the operations inside an architecture to improve the performance without introducing extra computational cost. We have proposed a Neural Architecture Transformer (NAT) method which casts the optimization problem into a Markov Decision Process (MDP) We propose a Neural Architecture Transformer++ (NAT++) method which further enlarges the set of candidate transitions to improve the performance of architecture optimization.
arXiv Detail & Related papers (2021-02-20T09:38:10Z)
PolyDL: Polyhedral Optimizations for Creation of High Performance DL primitives [55.79741270235602]
We present compiler algorithms to automatically generate high performance implementations of Deep Learning primitives. We develop novel data reuse analysis algorithms using the polyhedral model. We also show that such a hybrid compiler plus a minimal library-use approach results in state-of-the-art performance.
arXiv Detail & Related papers (2020-06-02T06:44:09Z)
Near-Optimal Hardware Design for Convolutional Neural Networks [0.0]
This study proposes a novel, special-purpose, and high-efficiency hardware architecture for convolutional neural networks. The proposed architecture maximizes the utilization of multipliers by designing the computational circuit with the same structure as that of the computational flow of the model. An implementation based on the proposed hardware architecture has been applied in commercial AI products.
arXiv Detail & Related papers (2020-02-06T09:15:03Z)

This list is automatically generated from the titles and abstracts of the papers in this site.