Related papers: Benchmarking End-To-End Performance of AI-Based Chip Placement Algorithms

Benchmarking End-To-End Performance of AI-Based Chip Placement Algorithms

URL: http://arxiv.org/abs/2407.15026v1
Date: Wed, 3 Jul 2024 03:29:23 GMT
Title: Benchmarking End-To-End Performance of AI-Based Chip Placement Algorithms
Authors: Zhihai Wang, Zijie Geng, Zhaojie Tu, Jie Wang, Yuxi Qian, Zhexuan Xu, Ziyan Liu, Siyuan Xu, Zhentao Tang, Shixiong Kai, Mingxuan Yuan, Jianye Hao, Bin Li, Yongdong Zhang, Feng Wu,
Abstract summary: ChiPBench is a benchmark designed to evaluate the effectiveness of AI-based chip placement algorithms. We have gathered 20 circuits from various domains (e.g., CPU, GPU, and microcontrollers) for evaluation. Results show that even if intermediate metric of a single-point algorithm is dominant, the final PPA results are unsatisfactory.
Score: 77.71341200638416
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: The increasing complexity of modern very-large-scale integration (VLSI) design highlights the significance of Electronic Design Automation (EDA) technologies. Chip placement is a critical step in the EDA workflow, which positions chip modules on the canvas with the goal of optimizing performance, power, and area (PPA) metrics of final chip designs. Recent advances have demonstrated the great potential of AI-based algorithms in enhancing chip placement. However, due to the lengthy workflow of chip design, the evaluations of these algorithms often focus on intermediate surrogate metrics, which are easy to compute but frequently reveal a substantial misalignment with the end-to-end performance (i.e., the final design PPA). To address this challenge, we introduce ChiPBench, which can effectively facilitate research in chip placement within the AI community. ChiPBench is a comprehensive benchmark specifically designed to evaluate the effectiveness of existing AI-based chip placement algorithms in improving final design PPA metrics. Specifically, we have gathered 20 circuits from various domains (e.g., CPU, GPU, and microcontrollers). These designs are compiled by executing the workflow from the verilog source code, which preserves necessary physical implementation kits, enabling evaluations for the placement algorithms on their impacts on the final design PPA. We executed six state-of-the-art AI-based chip placement algorithms on these designs and plugged the results of each single-point algorithm into the physical implementation workflow to obtain the final PPA results. Experimental results show that even if intermediate metric of a single-point algorithm is dominant, while the final PPA results are unsatisfactory. We believe that our benchmark will serve as an effective evaluation framework to bridge the gap between academia and industry.

Related papers

Elk: Exploring the Efficiency of Inter-core Connected AI Chips with Deep Learning Compiler Techniques [4.967030650006704]
Elk is a DL compiler framework to maximize the efficiency of inter-core connected AI chips.<n>It generates globally optimized execution plans that best overlap off-chip data loading and on-chip execution.<n>Elk achieves 94% of the ideal performance of ICCA chips on average.
arXiv Detail & Related papers (2025-07-15T17:21:31Z)
Inter2Former: Dynamic Hybrid Attention for Efficient High-Precision Interactive [58.0729162588429]
Interactive segmentation improves annotation efficiency by segmenting target regions from user prompts.<n>Current approaches face a critical trade-off: dense-token methods achieve superior accuracy but suffer from prohibitively slow processing on CPU devices.<n>We propose Inter2Former to address this challenge by optimizing computation allocation in dense-token processing.
arXiv Detail & Related papers (2025-07-13T12:33:37Z)
Review, Refine, Repeat: Understanding Iterative Decoding of AI Agents with Dynamic Evaluation and Selection [71.92083784393418]
Inference-time methods such as Best-of-N (BON) sampling offer a simple yet effective alternative to improve performance. We propose Iterative Agent Decoding (IAD) which combines iterative refinement with dynamic candidate evaluation and selection guided by a verifier.
arXiv Detail & Related papers (2025-04-02T17:40:47Z)
Exploration of Design Alternatives for Reducing Idle Time in Shor's Algorithm: A Study on Monolithic and Distributed Quantum Systems [4.430488261124667]
We introduce an alternating design approach to minimize idle time while preserving qubit efficiency in Shor's algorithm. We also demonstrate how task rearrangement enhances execution efficiency in the presence of multiple distribution channels. Our findings provide a structured framework for optimizing compiled quantum circuits for Shor's algorithm.
arXiv Detail & Related papers (2025-03-28T16:07:52Z)
TANGO: A Robust Qubit Mapping Algorithm via Two-Stage Search and Bidirectional Look [7.064817742048067]
Current quantum devices lack full qubit connectivity, making it difficult to directly execute logical circuits on quantum devices. We propose the TANGO algorithm, which balances the impact of qubit mapping on both mapped and unmapped nodes. We show that the algorithm achieves multi-objective co-optimization of gate count and circuit depth across various benchmarks and quantum devices.
arXiv Detail & Related papers (2025-03-10T13:44:16Z)
Faster WIND: Accelerating Iterative Best-of-$N$ Distillation for LLM Alignment [81.84950252537618]
This paper reveals a unified game-theoretic connection between iterative BOND and self-play alignment. We establish a novel framework, WIN rate Dominance (WIND), with a series of efficient algorithms for regularized win rate dominance optimization.
arXiv Detail & Related papers (2024-10-28T04:47:39Z)
Automated and Holistic Co-design of Neural Networks and ASICs for Enabling In-Pixel Intelligence [4.063480188363124]
Extreme edge-AI systems, such as those in readout ASICs for radiation detection, must operate under stringent hardware constraints. Finding ideal solutions means identifying optimal AI and ASIC design choices from a design space that has explosively expanded.
arXiv Detail & Related papers (2024-07-18T17:58:05Z)
Synergistic Dynamical Decoupling and Circuit Design for Enhanced Algorithm Performance on Near-Term Quantum Devices [0.5261718469769447]
Dynamical decoupling (DD) is a promising technique for mitigating errors in near-term quantum devices. We analyze how hardware features and algorithm design impact the effectiveness of DD for error mitigation. The results reveal an inverse relationship between the effectiveness of DD and the inherent performance of the algorithm.
arXiv Detail & Related papers (2024-05-27T14:48:05Z)
Adaptive Inference through Early-Exit Networks: Design, Challenges and Directions [80.78077900288868]
We decompose the design methodology of early-exit networks to its key components and survey the recent advances in each one of them. We position early-exiting against other efficient inference solutions and provide our insights on the current challenges and most promising future directions for research in the field.
arXiv Detail & Related papers (2021-06-09T12:33:02Z)
Multi-objective Optimisation of Digital Circuits based on Cell Mapping in an Industrial EDA Flow [0.2578242050187029]
A fully-automated, multi-objective (MO) EDA flow is introduced to address this issue. We have applied the proposed MOEDA framework to ISCAS-85 and EPFL benchmark circuits using a commercial 65nm standard cell library.
arXiv Detail & Related papers (2021-05-21T15:29:58Z)
Towards Large Scale Automated Algorithm Design by Integrating Modular Benchmarking Frameworks [0.9281671380673306]
We present a first proof-of-concept use-case that demonstrates the efficiency of the algorithm framework ParadisEO with the automated algorithm configuration tool irace and the experimental platform IOHprofiler. Key advantages of our pipeline are fast evaluation times, the possibility to generate rich data sets, and a standardized interface that can be used to benchmark very broad classes of sampling-based optimizations.
arXiv Detail & Related papers (2021-02-12T10:47:00Z)
Improving EEG Decoding via Clustering-based Multi-task Feature Learning [27.318646122939537]
Machine learning provides a promising technique to optimize EEG patterns toward better decoding accuracy. Existing algorithms do not effectively explore the underlying data structure capturing the true EEG sample distribution. We propose a clustering-based multi-task feature learning algorithm for improved EEG pattern decoding.
arXiv Detail & Related papers (2020-12-12T13:31:53Z)
An AI-Assisted Design Method for Topology Optimization Without Pre-Optimized Training Data [68.8204255655161]
An AI-assisted design method based on topology optimization is presented, which is able to obtain optimized designs in a direct way. Designs are provided by an artificial neural network, the predictor, on the basis of boundary conditions and degree of filling as input data.
arXiv Detail & Related papers (2020-12-11T14:33:27Z)
Process Discovery for Structured Program Synthesis [70.29027202357385]
A core task in process mining is process discovery which aims to learn an accurate process model from event log data. In this paper, we propose to use (block-) structured programs directly as target process models. We develop a novel bottom-up agglomerative approach to the discovery of such structured program process models.
arXiv Detail & Related papers (2020-08-13T10:33:10Z)
AQD: Towards Accurate Fully-Quantized Object Detection [94.06347866374927]
We propose an Accurate Quantized object Detection solution, termed AQD, to get rid of floating-point computation. Our AQD achieves comparable or even better performance compared with the full-precision counterpart under extremely low-bit schemes.
arXiv Detail & Related papers (2020-07-14T09:07:29Z)

This list is automatically generated from the titles and abstracts of the papers in this site.