Related papers: RL-MUL 2.0: Multiplier Design Optimization with Parallel Deep Reinforcement Learning and Space Reduction

RL-MUL 2.0: Multiplier Design Optimization with Parallel Deep Reinforcement Learning and Space Reduction

URL: http://arxiv.org/abs/2404.00639v2
Date: Fri, 27 Dec 2024 13:26:58 GMT
Title: RL-MUL 2.0: Multiplier Design Optimization with Parallel Deep Reinforcement Learning and Space Reduction
Authors: Dongsheng Zuo, Jiadong Zhu, Yikang Ouyang, Yuzhe Ma,
Abstract summary: We propose a multiplier design optimization framework based on reinforcement learning.<n>We utilize matrix and tensor representations for the compressor tree of a multiplier, enabling seamless integration of convolutional neural networks as the agent network.<n> Experiments conducted on different bit widths of multipliers demonstrate that multipliers produced by our approach outperform all baseline designs in terms of area, power, and delay.
Score: 8.093985979285533
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: Multiplication is a fundamental operation in many applications, and multipliers are widely adopted in various circuits. However, optimizing multipliers is challenging due to the extensive design space. In this paper, we propose a multiplier design optimization framework based on reinforcement learning. We utilize matrix and tensor representations for the compressor tree of a multiplier, enabling seamless integration of convolutional neural networks as the agent network. The agent optimizes the multiplier structure using a Pareto-driven reward customized to balance area and delay. Furthermore, we enhance the original framework with parallel reinforcement learning and design space pruning techniques and extend its capability to optimize fused multiply-accumulate (MAC) designs. Experiments conducted on different bit widths of multipliers demonstrate that multipliers produced by our approach outperform all baseline designs in terms of area, power, and delay. The performance gain is further validated by comparing the area, power, and delay of processing element arrays using multipliers from our approach and baseline approaches.

Related papers

DOMAC: Differentiable Optimization for High-Speed Multipliers and Multiply-Accumulators [25.876084896293058]
DOMAC is a novel approach that employs differentiable optimization for designing multipliers and MACs at specific technology nodes. Building on this insight, DOMAC reformulates the discrete optimization challenge into a continuous problem by incorporating differentiable timing and area objectives.
arXiv Detail & Related papers (2025-03-31T10:49:05Z)
Systems and Algorithms for Convolutional Multi-Hybrid Language Models at Scale [68.6602625868888]
We introduce convolutional multi-hybrid architectures, with a design grounded on two simple observations. Operators in hybrid models can be tailored to token manipulation tasks such as in-context recall, multi-token recall, and compression. We train end-to-end 1.2 to 2.9 times faster than optimized Transformers, and 1.1 to 1.4 times faster than previous generation hybrids.
arXiv Detail & Related papers (2025-02-25T19:47:20Z)
M3: Mamba-assisted Multi-Circuit Optimization via MBRL with Effective Scheduling [6.496667180036735]
M3 is a novel Model-based RL (MBRL) method employing the Mamba architecture and effective scheduling. It significantly improves sample efficiency compared to existing RL methods.
arXiv Detail & Related papers (2024-11-25T00:30:49Z)
A Hassle-free Algorithm for Private Learning in Practice: Don't Use Tree Aggregation, Use BLTs [4.736297244235246]
This paper extends the recently introduced Buffered Linear Toeplitz (BLT) mechanism to multi-participation scenarios. Our BLT-DP-FTRL maintains the ease-of-use advantages of tree aggregation, while essentially matching matrix factorization in terms of utility and privacy.
arXiv Detail & Related papers (2024-08-16T17:52:22Z)
FactorLLM: Factorizing Knowledge via Mixture of Experts for Large Language Models [50.331708897857574]
We introduce FactorLLM, a novel approach that decomposes well-trained dense FFNs into sparse sub-networks without requiring any further modifications. FactorLLM achieves comparable performance to the source model securing up to 85% model performance while obtaining over a 30% increase in inference speed.
arXiv Detail & Related papers (2024-08-15T16:45:16Z)
Compute Better Spent: Replacing Dense Layers with Structured Matrices [77.61728033234233]
We identify more efficient alternatives to dense matrices, as exemplified by the success of convolutional networks in the image domain. We show that different structures often require drastically different initialization scales and learning rates, which are crucial to performance. We propose a novel matrix family containing Monarch matrices, the Block-Train, which we show performs better than dense for the same compute on multiple tasks.
arXiv Detail & Related papers (2024-06-10T13:25:43Z)
ArCHer: Training Language Model Agents via Hierarchical Multi-Turn RL [80.10358123795946]
We develop a framework for building multi-turn RL algorithms for fine-tuning large language models. Our framework adopts a hierarchical RL approach and runs two RL algorithms in parallel. Empirically, we find that ArCHer significantly improves efficiency and performance on agent tasks.
arXiv Detail & Related papers (2024-02-29T18:45:56Z)
KyberMat: Efficient Accelerator for Matrix-Vector Polynomial Multiplication in CRYSTALS-Kyber Scheme via NTT and Polyphase Decomposition [20.592217626952507]
CRYSTAL-Kyber (Kyber) is one of the post-quantum cryptography (PQC) key-encapsulation mechanism (KEM) schemes selected during the standardization process. This paper addresses optimization for Kyber architecture with respect to latency and throughput constraints.
arXiv Detail & Related papers (2023-10-06T22:57:25Z)
An Efficient Algorithm for Clustered Multi-Task Compressive Sensing [60.70532293880842]
Clustered multi-task compressive sensing is a hierarchical model that solves multiple compressive sensing tasks. The existing inference algorithm for this model is computationally expensive and does not scale well in high dimensions. We propose a new algorithm that substantially accelerates model inference by avoiding the need to explicitly compute these covariance matrices.
arXiv Detail & Related papers (2023-09-30T15:57:14Z)
Generalized Activation via Multivariate Projection [46.837481855573145]
Activation functions are essential to introduce nonlinearity into neural networks. We consider ReLU as a projection from R onto the nonnegative half-line R+. We extend ReLU by substituting it with a generalized projection operator onto a convex cone, such as the Second-Order Cone (SOC) projection.
arXiv Detail & Related papers (2023-09-29T12:44:27Z)
Reconfigurable Distributed FPGA Cluster Design for Deep Learning Accelerators [59.11160990637615]
We propose a distributed system based on lowpower embedded FPGAs designed for edge computing applications. The proposed system can simultaneously execute diverse Neural Network (NN) models, arrange the graph in a pipeline structure, and manually allocate greater resources to the most computationally intensive layers of the NN graph.
arXiv Detail & Related papers (2023-05-24T16:08:55Z)
Low-Latency Online Multiplier with Reduced Activities and Minimized Interconnect for Inner Product Arrays [0.8078491757252693]
This paper proposes a low latency multiplier based on online or left-to-right arithmetic. Online arithmetic enables overlapping successive operations regardless of data dependency. Serial nature of the online algorithm and gradual increment/decrement of active slices minimize the interconnects and signal activities.
arXiv Detail & Related papers (2023-04-06T01:22:27Z)
Performance Embeddings: A Similarity-based Approach to Automatic Performance Optimization [71.69092462147292]
Performance embeddings enable knowledge transfer of performance tuning between applications. We demonstrate this transfer tuning approach on case studies in deep neural networks, dense and sparse linear algebra compositions, and numerical weather prediction stencils.
arXiv Detail & Related papers (2023-03-14T15:51:35Z)
Energy Efficiency Maximization in IRS-Aided Cell-Free Massive MIMO System [2.9081408997650375]
In this paper, we consider an intelligent reflecting surface (IRS)-aided cell-free massive multiple-input multiple-output system, where the beamforming at access points and the phase shifts at IRSs are jointly optimized to maximize energy efficiency (EE) To solve EE problem, we propose an iterative optimization algorithm by using quadratic transform and Lagrangian dual transform to find the optimum beamforming and phase shifts. We further propose a deep learning based approach for joint beamforming and phase shifts design. Specifically, a two-stage deep neural network is trained offline using the unsupervised learning manner, which is then deployed online for
arXiv Detail & Related papers (2022-12-24T14:58:15Z)
Multi-Agent Reinforcement Learning for Microprocessor Design Space Exploration [71.95914457415624]
Microprocessor architects are increasingly resorting to domain-specific customization in the quest for high-performance and energy-efficiency. We propose an alternative formulation that leverages Multi-Agent RL (MARL) to tackle this problem. Our evaluation shows that the MARL formulation consistently outperforms single-agent RL baselines.
arXiv Detail & Related papers (2022-11-29T17:10:24Z)
Curriculum-based Asymmetric Multi-task Reinforcement Learning [14.5357225087828]
We introduce CAMRL, the first curriculum-based asymmetric multi-task learning (AMTL) algorithm for dealing with multiple reinforcement learning (RL) tasks altogether. To mitigate the negative influence of customizing the one-off training order in curriculum-based AMTL, CAMRL switches its training mode between parallel single-task RL and asymmetric multi-task RL (MTRL) We have conducted experiments on a wide range of benchmarks in multi-task RL, covering Gym-minigrid, Meta-world, Atari video games, vision-based PyBullet tasks, and RLBench.
arXiv Detail & Related papers (2022-11-07T08:05:13Z)
LIBRA: Enabling Workload-aware Multi-dimensional Network Topology Optimization for Distributed Training of Large AI Models [6.980277221943408]
We motivate the design of multi-dimensional networks within machine learning systems as a cost-efficient mechanism to enhance overall network bandwidth. We introduce LIBRA, a framework specifically focused on optimizing multi-dimensional fabric architectures.
arXiv Detail & Related papers (2021-09-24T06:22:28Z)
Machine Learning Framework for Quantum Sampling of Highly-Constrained, Continuous Optimization Problems [101.18253437732933]
We develop a generic, machine learning-based framework for mapping continuous-space inverse design problems into surrogate unconstrained binary optimization problems. We showcase the framework's performance on two inverse design problems by optimizing thermal emitter topologies for thermophotovoltaic applications and (ii) diffractive meta-gratings for highly efficient beam steering.
arXiv Detail & Related papers (2021-05-06T02:22:23Z)
Decomposability and Parallel Computation of Multi-Agent LQR [19.710361049812608]
We propose a parallel RL scheme for a linear regulator (LQR) design in a continuous-time linear MAS. We show that if the MAS is homogeneous then this decomposition retains closed-loop optimality. The proposed approach can guarantee significant speed-up in learning without any loss in the cumulative value of the LQR cost.
arXiv Detail & Related papers (2020-10-16T20:15:39Z)
Iterative Algorithm Induced Deep-Unfolding Neural Networks: Precoding Design for Multiuser MIMO Systems [59.804810122136345]
We propose a framework for deep-unfolding, where a general form of iterative algorithm induced deep-unfolding neural network (IAIDNN) is developed. An efficient IAIDNN based on the structure of the classic weighted minimum mean-square error (WMMSE) iterative algorithm is developed. We show that the proposed IAIDNN efficiently achieves the performance of the iterative WMMSE algorithm with reduced computational complexity.
arXiv Detail & Related papers (2020-06-15T02:57:57Z)

This list is automatically generated from the titles and abstracts of the papers in this site.