DeepGate4: Efficient and Effective Representation Learning for Circuit Design at Scale
- URL: http://arxiv.org/abs/2502.01681v2
- Date: Mon, 10 Feb 2025 08:28:50 GMT
- Title: DeepGate4: Efficient and Effective Representation Learning for Circuit Design at Scale
- Authors: Ziyang Zheng, Shan Huang, Jianyuan Zhong, Zhengyuan Shi, Guohao Dai, Ningyi Xu, Qiang Xu,
- Abstract summary: We introduce DeepGate4, a scalable and efficient graph transformer for large-scale circuits.
DeepGate4 significantly surpasses state-of-the-art methods, achieving 15.5% performance improvements over the next-best models.
- Score: 12.73514467497144
- License:
- Abstract: Circuit representation learning has become pivotal in electronic design automation, enabling critical tasks such as testability analysis, logic reasoning, power estimation, and SAT solving. However, existing models face significant challenges in scaling to large circuits due to limitations like over-squashing in graph neural networks and the quadratic complexity of transformer-based models. To address these issues, we introduce DeepGate4, a scalable and efficient graph transformer specifically designed for large-scale circuits. DeepGate4 incorporates several key innovations: (1) an update strategy tailored for circuit graphs, which reduce memory complexity to sub-linear and is adaptable to any graph transformer; (2) a GAT-based sparse transformer with global and local structural encodings for AIGs; and (3) an inference acceleration CUDA kernel that fully exploit the unique sparsity patterns of AIGs. Our extensive experiments on the ITC99 and EPFL benchmarks show that DeepGate4 significantly surpasses state-of-the-art methods, achieving 15.5% and 31.1% performance improvements over the next-best models. Furthermore, the Fused-DeepGate4 variant reduces runtime by 35.1% and memory usage by 46.8%, making it highly efficient for large-scale circuit analysis. These results demonstrate the potential of DeepGate4 to handle complex EDA tasks while offering superior scalability and efficiency.
Related papers
- GAQAT: gradient-adaptive quantization-aware training for domain generalization [54.31450550793485]
We propose a novel Gradient-Adaptive Quantization-Aware Training (GAQAT) framework for DG.
Our approach begins by identifying the scale-gradient conflict problem in low-precision quantization.
Extensive experiments validate the effectiveness of the proposed GAQAT framework.
arXiv Detail & Related papers (2024-12-07T06:07:21Z) - Shavette: Low Power Neural Network Acceleration via Algorithm-level Error Detection and Undervolting [0.0]
This brief introduces a simple approach for enabling reduced voltage operation of Deep Neural Network (DNN) accelerators by mere software modifications.
We demonstrate 18% to 25% energy saving with no accuracy loss of the models and negligible throughput compromise.
arXiv Detail & Related papers (2024-10-17T10:29:15Z) - Accelerating Error Correction Code Transformers [56.75773430667148]
We introduce a novel acceleration method for transformer-based decoders.
We achieve a 90% compression ratio and reduce arithmetic operation energy consumption by at least 224 times on modern hardware.
arXiv Detail & Related papers (2024-10-08T11:07:55Z) - DeepGate3: Towards Scalable Circuit Representation Learning [9.910071321534682]
Circuit representation learning has shown promising results in advancing the field of Electronic Design Automation (EDA)
Existing models, such as DeepGate Family, primarily utilize Graph Neural Networks (GNNs) to encode circuit netlists into gate-level embeddings.
We introduce DeepGate3, an enhanced architecture that integrates Transformer modules following the initial GNN processing.
arXiv Detail & Related papers (2024-07-15T02:44:21Z) - LeYOLO, New Scalable and Efficient CNN Architecture for Object Detection [0.0]
We focus on design choices of neural network architectures for efficient object detection based on FLOP.
We propose several optimizations to enhance the efficiency of YOLO-based models.
This paper contributes to a new scaling paradigm for object detection and YOLO-centric models called LeYOLO.
arXiv Detail & Related papers (2024-06-20T12:08:24Z) - Fusing Pretrained ViTs with TCNet for Enhanced EEG Regression [0.07999703756441758]
This paper details the integration of pre-trained Vision Transformers (ViTs) with Temporal Convolutional Networks (TCNet) to enhance the precision of EEG regression.
Our results showcase a substantial improvement in regression accuracy, as evidenced by the reduction of Root Mean Square Error (RMSE) from 55.4 to 51.8.
Without sacrificing performance, we increase the speed of this model by an order of magnitude (up to 4.32x faster)
arXiv Detail & Related papers (2024-04-02T17:01:51Z) - Graph Transformers for Large Graphs [57.19338459218758]
This work advances representation learning on single large-scale graphs with a focus on identifying model characteristics and critical design constraints.
A key innovation of this work lies in the creation of a fast neighborhood sampling technique coupled with a local attention mechanism.
We report a 3x speedup and 16.8% performance gain on ogbn-products and snap-patents, while we also scale LargeGT on ogbn-100M with a 5.9% performance improvement.
arXiv Detail & Related papers (2023-12-18T11:19:23Z) - SGFormer: Simplifying and Empowering Transformers for Large-Graph Representations [75.71298846760303]
We show that a one-layer attention can bring up surprisingly competitive performance across node property prediction benchmarks.
We frame the proposed scheme as Simplified Graph Transformers (SGFormer), which is empowered by a simple attention model.
We believe the proposed methodology alone enlightens a new technical path of independent interest for building Transformers on large graphs.
arXiv Detail & Related papers (2023-06-19T08:03:25Z) - Global Vision Transformer Pruning with Hessian-Aware Saliency [93.33895899995224]
This work challenges the common design philosophy of the Vision Transformer (ViT) model with uniform dimension across all the stacked blocks in a model stage.
We derive a novel Hessian-based structural pruning criteria comparable across all layers and structures, with latency-aware regularization for direct latency reduction.
Performing iterative pruning on the DeiT-Base model leads to a new architecture family called NViT (Novel ViT), with a novel parameter that utilizes parameters more efficiently.
arXiv Detail & Related papers (2021-10-10T18:04:59Z) - FantastIC4: A Hardware-Software Co-Design Approach for Efficiently
Running 4bit-Compact Multilayer Perceptrons [19.411734658680967]
We propose a software-hardware optimization paradigm for obtaining a highly efficient execution engine of deep neural networks (DNNs)
Our approach is centred around compression as a means for reducing the area as well as power requirements of, concretely, multilayer perceptrons (MLPs) with high predictive performances.
We show that we can achieve throughputs of 2.45 TOPS with a total power consumption of 3.6W on a Virtual Ultrascale FPGA XCVU440 device implementation, and achieve a total power efficiency of 20.17 TOPS/W on a 22nm process ASIC version.
arXiv Detail & Related papers (2020-12-17T19:10:04Z) - End-to-End Multi-speaker Speech Recognition with Transformer [88.22355110349933]
We replace the RNN-based encoder-decoder in the speech recognition model with a Transformer architecture.
We also modify the self-attention component to be restricted to a segment rather than the whole sequence in order to reduce computation.
arXiv Detail & Related papers (2020-02-10T16:29:26Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.