Related papers: DOMAC: Differentiable Optimization for High-Speed Multipliers and Multiply-Accumulators

DOMAC: Differentiable Optimization for High-Speed Multipliers and Multiply-Accumulators

URL: http://arxiv.org/abs/2503.23943v1
Date: Mon, 31 Mar 2025 10:49:05 GMT
Title: DOMAC: Differentiable Optimization for High-Speed Multipliers and Multiply-Accumulators
Authors: Chenhao Xue, Yi Ren, Jinwei Zhou, Kezhi Li, Chen Zhang, Yibo Lin, Lining Zhang, Qiang Xu, Guangyu Sun,
Abstract summary: DOMAC is a novel approach that employs differentiable optimization for designing multipliers and MACs at specific technology nodes.<n>Building on this insight, DOMAC reformulates the discrete optimization challenge into a continuous problem by incorporating differentiable timing and area objectives.
Score: 25.876084896293058
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Multipliers and multiply-accumulators (MACs) are fundamental building blocks for compute-intensive applications such as artificial intelligence. With the diminishing returns of Moore's Law, optimizing multiplier performance now necessitates process-aware architectural innovations rather than relying solely on technology scaling. In this paper, we introduce DOMAC, a novel approach that employs differentiable optimization for designing multipliers and MACs at specific technology nodes. DOMAC establishes an analogy between optimizing multi-staged parallel compressor trees and training deep neural networks. Building on this insight, DOMAC reformulates the discrete optimization challenge into a continuous problem by incorporating differentiable timing and area objectives. This formulation enables us to utilize existing deep learning toolkit for highly efficient implementation of the differentiable solver. Experimental results demonstrate that DOMAC achieves significant enhancements in both performance and area efficiency compared to state-of-the-art baselines and commercial IPs in multiplier and MAC designs.

Related papers

A Survey on Inference Optimization Techniques for Mixture of Experts Models [50.40325411764262]
Large-scale Mixture of Experts (MoE) models offer enhanced model capacity and computational efficiency through conditional computation.<n> deploying and running inference on these models presents significant challenges in computational resources, latency, and energy efficiency.<n>This survey analyzes optimization techniques for MoE models across the entire system stack.
arXiv Detail & Related papers (2024-12-18T14:11:15Z)
Numerical Pruning for Efficient Autoregressive Models [87.56342118369123]
This paper focuses on compressing decoder-only transformer-based autoregressive models through structural weight pruning. Specifically, we propose a training-free pruning method that calculates a numerical score with Newton's method for the Attention and modules, respectively. To verify the effectiveness of our method, we provide both theoretical support and extensive experiments.
arXiv Detail & Related papers (2024-12-17T01:09:23Z)
Synergistic Development of Perovskite Memristors and Algorithms for Robust Analog Computing [53.77822620185878]
We propose a synergistic methodology to concurrently optimize perovskite memristor fabrication and develop robust analog DNNs.<n>We develop "BayesMulti", a training strategy utilizing BO-guided noise injection to improve the resistance of analog DNNs to memristor imperfections.<n>Our integrated approach enables use of analog computing in much deeper and wider networks, achieving up to 100-fold improvements.
arXiv Detail & Related papers (2024-12-03T19:20:08Z)
EPS-MoE: Expert Pipeline Scheduler for Cost-Efficient MoE Inference [49.94169109038806]
This paper introduces EPS-MoE, a novel expert pipeline scheduler for MoE that surpasses the existing parallelism schemes.<n>Our results demonstrate at most 52.4% improvement in prefill throughput compared to existing parallel inference methods.
arXiv Detail & Related papers (2024-10-16T05:17:49Z)
FactorLLM: Factorizing Knowledge via Mixture of Experts for Large Language Models [50.331708897857574]
We introduce FactorLLM, a novel approach that decomposes well-trained dense FFNs into sparse sub-networks without requiring any further modifications. FactorLLM achieves comparable performance to the source model securing up to 85% model performance while obtaining over a 30% increase in inference speed.
arXiv Detail & Related papers (2024-08-15T16:45:16Z)
Mixed-precision Neural Networks on RISC-V Cores: ISA extensions for Multi-Pumped Soft SIMD Operations [5.847997723738113]
Modern embedded microprocessors provide very limited support for mixed-precision NNs. We introduce a hardware-software co-design framework that enables cooperative hardware design, mixed-precision quantization, ISA extensions and inference. Our framework can achieve, on average, 15x energy reduction for less than 1% accuracy loss and outperforms the ISA-agnostic state-of-the-art RISC-V cores.
arXiv Detail & Related papers (2024-07-19T12:54:04Z)
RL-MUL 2.0: Multiplier Design Optimization with Parallel Deep Reinforcement Learning and Space Reduction [8.093985979285533]
We propose a multiplier design optimization framework based on reinforcement learning.<n>We utilize matrix and tensor representations for the compressor tree of a multiplier, enabling seamless integration of convolutional neural networks as the agent network.<n> Experiments conducted on different bit widths of multipliers demonstrate that multipliers produced by our approach outperform all baseline designs in terms of area, power, and delay.
arXiv Detail & Related papers (2024-03-31T10:43:33Z)
Machine Learning Insides OptVerse AI Solver: Design Principles and Applications [74.67495900436728]
We present a comprehensive study on the integration of machine learning (ML) techniques into Huawei Cloud's OptVerse AI solver. We showcase our methods for generating complex SAT and MILP instances utilizing generative models that mirror multifaceted structures of real-world problem. We detail the incorporation of state-of-the-art parameter tuning algorithms which markedly elevate solver performance.
arXiv Detail & Related papers (2024-01-11T15:02:15Z)
Structured Pruning of Neural Networks for Constraints Learning [5.689013857168641]
We show the effectiveness of pruning, one of these techniques, when applied to ANNs prior to their integration into MIPs. We conduct experiments using feed-forward neural networks with multiple layers to construct adversarial examples. Our results demonstrate that pruning offers remarkable reductions in solution times without hindering the quality of the final decision.
arXiv Detail & Related papers (2023-07-14T16:36:49Z)
Heterogeneous Integration of In-Memory Analog Computing Architectures with Tensor Processing Units [0.0]
This paper introduces a novel, heterogeneous, mixed-signal, and mixed-precision architecture that integrates an IMAC unit with an edge TPU to enhance mobile CNN performance. We propose a unified learning algorithm that incorporates mixed-precision training techniques to mitigate potential accuracy drops when deploying models on the TPU-IMAC architecture.
arXiv Detail & Related papers (2023-04-18T19:44:56Z)
Ps and Qs: Quantization-aware pruning for efficient low latency neural network inference [56.24109486973292]
We study the interplay between pruning and quantization during the training of neural networks for ultra low latency applications. We find that quantization-aware pruning yields more computationally efficient models than either pruning or quantization alone for our task.
arXiv Detail & Related papers (2021-02-22T19:00:05Z)
Differentiable Adaptive Computation Time for Visual Reasoning [4.7518908453572]
This paper presents a novel attention-based algorithm for achieving adaptive computation called DACT. In particular, we study its application to the widely known MAC architecture. We show that by increasing the maximum number of steps used, we surpass the accuracy of even our best non-adaptive MAC in the CLEVR dataset.
arXiv Detail & Related papers (2020-04-27T13:20:23Z)

This list is automatically generated from the titles and abstracts of the papers in this site.