Related papers: Hierarchical Mixture of Experts: Generalizable Learning for High-Level Synthesis

Hierarchical Mixture of Experts: Generalizable Learning for High-Level Synthesis

URL: http://arxiv.org/abs/2410.19225v1
Date: Fri, 25 Oct 2024 00:27:53 GMT
Title: Hierarchical Mixture of Experts: Generalizable Learning for High-Level Synthesis
Authors: Weikai Li, Ding Wang, Zijian Ding, Atefeh Sohrabizadeh, Zongyue Qin, Jason Cong, Yizhou Sun,
Abstract summary: High-level synthesis (HLS) is a widely used tool in designing Field Programmable Gate Array (FPGA) We propose a more domain-generalizable model structure: a two-level hierarchical Mixture of Experts (MoE) In the low-level MoE, we apply MoE on three natural granularities of a program: node, basic block, and graph. The high-level MoE learns to aggregate the three granularities for the final decision.
Score: 43.612837464039686
License: http://creativecommons.org/licenses/by/4.0/
Abstract: High-level synthesis (HLS) is a widely used tool in designing Field Programmable Gate Array (FPGA). HLS enables FPGA design with software programming languages by compiling the source code into an FPGA circuit. The source code includes a program (called ``kernel'') and several pragmas that instruct hardware synthesis, such as parallelization, pipeline, etc. While it is relatively easy for software developers to design the program, it heavily relies on hardware knowledge to design the pragmas, posing a big challenge for software developers. Recently, different machine learning algorithms, such as GNNs, have been proposed to automate the pragma design via performance prediction. However, when applying the trained model on new kernels, the significant domain shift often leads to unsatisfactory performance. We propose a more domain-generalizable model structure: a two-level hierarchical Mixture of Experts (MoE), that can be flexibly adapted to any GNN model. Different expert networks can learn to deal with different regions in the representation space, and they can utilize similar patterns between the old kernels and new kernels. In the low-level MoE, we apply MoE on three natural granularities of a program: node, basic block, and graph. The high-level MoE learns to aggregate the three granularities for the final decision. To stably train the hierarchical MoE, we further propose a two-stage training method. Extensive experiments verify the effectiveness of the hierarchical MoE.

Related papers

GenEDA: Unleashing Generative Reasoning on Netlist via Multimodal Encoder-Decoder Aligned Foundation Model [8.115489346573918]
GenEDA is a framework that aligns circuit encoders with decoders within a shared latent space. Built on this architecture, GenEDA enables three unprecedented generative reasoning tasks over netlists.
arXiv Detail & Related papers (2025-04-13T08:56:22Z)
FusionLLM: A Decentralized LLM Training System on Geo-distributed GPUs with Adaptive Compression [55.992528247880685]
Decentralized training faces significant challenges regarding system design and efficiency. We present FusionLLM, a decentralized training system designed and implemented for training large deep neural networks (DNNs) We show that our system and method can achieve 1.45 - 9.39x speedup compared to baseline methods while ensuring convergence.
arXiv Detail & Related papers (2024-10-16T16:13:19Z)
EPS-MoE: Expert Pipeline Scheduler for Cost-Efficient MoE Inference [49.94169109038806]
This paper introduces EPS-MoE, a novel expert pipeline scheduler for MoE. Our results demonstrate an average 21% improvement in prefill throughput over existing parallel inference methods.
arXiv Detail & Related papers (2024-10-16T05:17:49Z)
Language Models are Graph Learners [70.14063765424012]
Language Models (LMs) are challenging the dominance of domain-specific models, including Graph Neural Networks (GNNs) and Graph Transformers (GTs) We propose a novel approach that empowers off-the-shelf LMs to achieve performance comparable to state-of-the-art GNNs on node classification tasks.
arXiv Detail & Related papers (2024-10-03T08:27:54Z)
RepoGraph: Enhancing AI Software Engineering with Repository-level Code Graph [63.87660059104077]
We present RepoGraph, a plug-in module that manages a repository-level structure for modern AI software engineering solutions. RepoGraph substantially boosts the performance of all systems, leading to a new state-of-the-art among open-source frameworks.
arXiv Detail & Related papers (2024-10-03T05:45:26Z)
Cross-Modality Program Representation Learning for Electronic Design Automation with High-Level Synthesis [45.471039079664656]
Domain-specific accelerators (DSAs) have gained popularity for applications such as deep learning and autonomous driving. We propose ProgSG, a model that allows interaction between the source code sequence modality and the graph modality in a deep and fine-grained way. We show that ProgSG reduces the RMSE of design performance predictions by up to $22%$, and identifies designs with an average of $1.10times$.
arXiv Detail & Related papers (2024-06-13T22:34:58Z)
MATADOR: Automated System-on-Chip Tsetlin Machine Design Generation for Edge Applications [0.2663045001864042]
This paper presents MATADOR, an automated-to-silicon tool with GUI interface capable of optimized accelerator design for inference at the edge. It offers automation of the full development pipeline: model training, system level design generation, design verification and deployment. MATADOR accelerator designs are shown to be up to 13.4x faster, up to 7x more resource frugal and up to 2x more power efficient when compared to state-of-the-art Quantized and Binary Deep Neural Network implementations.
arXiv Detail & Related papers (2024-03-03T10:31:46Z)
Stochastic Configuration Machines: FPGA Implementation [4.57421617811378]
configuration networks (SCNs) are a prime choice in industrial applications due to their merits and feasibility for data modelling. This paper aims to implement SCM models on a field programmable gate array (FPGA) and introduce binary-coded inputs to improve learning performance.
arXiv Detail & Related papers (2023-10-30T02:04:20Z)
CodeChain: Towards Modular Code Generation Through Chain of Self-revisions with Representative Sub-modules [51.82044734879657]
We propose CodeChain, a novel framework for inference that elicits modularized code generation through a chain of self-revisions. We find that CodeChain can significantly boost both modularity as well as correctness of the generated solutions, achieving relative pass@1 improvements of 35% on APPS and 76% on CodeContests.
arXiv Detail & Related papers (2023-10-13T10:17:48Z)
ProgSG: Cross-Modality Representation Learning for Programs in Electronic Design Automation [38.023395256208055]
High-level synthesis (HLS) allows a developer to compile a high-level description in the form of software code in C and C++. HLS tools still require microarchitecture decisions, expressed in terms of pragmas. We propose ProgSG allowing the source code sequence modality and the graph modalities to interact with each other in a deep and fine-grained way.
arXiv Detail & Related papers (2023-05-18T09:44:18Z)
Towards making the most of NLP-based device mapping optimization for OpenCL kernels [5.6596607119831575]
We extend the work of Cummins et al., namely Deeptune, that tackles the problem of optimal device selection ( CPU or GPU) for accelerated OpenCL kernels. We propose four different models that provide enhanced contextual information of source codes. Experimental results show that our proposed methodology surpasses that of Cummins et al. work, providing up to 4% improvement in prediction accuracy.
arXiv Detail & Related papers (2022-08-30T10:20:55Z)
LegoNN: Building Modular Encoder-Decoder Models [117.47858131603112]
State-of-the-art encoder-decoder models are constructed and trained end-to-end as an atomic unit. No component of the model can be (re-)used without the others, making it impossible to share parts. We describe LegoNN, a procedure for building encoder-decoder architectures in a way so that its parts can be applied to other tasks without the need for fine-tuning.
arXiv Detail & Related papers (2022-06-07T14:08:07Z)
A Framework for Interdomain and Multioutput Gaussian Processes [22.62911488724047]
We present a mathematical and software framework for scalable approximate inference in GPs. Our framework, implemented in GPflow, provides a unified interface for many existing multioutput models.
arXiv Detail & Related papers (2020-03-02T16:24:59Z)

This list is automatically generated from the titles and abstracts of the papers in this site.