Related papers: AI-Driven Optimization of Hardware Overlay Configurations

AI-Driven Optimization of Hardware Overlay Configurations

URL: http://arxiv.org/abs/2503.06351v1
Date: Sat, 08 Mar 2025 22:34:47 GMT
Title: AI-Driven Optimization of Hardware Overlay Configurations
Authors: Rasha Karakchi,
Abstract summary: This paper presents an AI-driven approach to optimizing FPGA overlay configurations.<n>By leveraging machine learning techniques, we predict the feasibility and efficiency of different configurations before hardware compilation.
Score: 0.0
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Designing and optimizing FPGA overlays is a complex and time-consuming process, often requiring multiple trial-and-error iterations to determine a suitable configuration. This paper presents an AI-driven approach to optimizing FPGA overlay configurations, specifically focusing on the NAPOLY+ automata processor implemented on the ZCU104 FPGA. By leveraging machine learning techniques, particularly Random Forest regression, we predict the feasibility and efficiency of different configurations before hardware compilation. Our method significantly reduces the number of required iterations by estimating resource utilization, including logical elements, distributed memory, and fanout, based on historical design data. Experimental results demonstrate that our model achieves high prediction accuracy, closely matching actual resource usage while accelerating the design process.

Related papers

Perturbation-efficient Zeroth-order Optimization for Hardware-friendly On-device Training [48.13509528824236]
Zeroth-order (ZO) optimization is an emerging deep neural network (DNN) training paradigm that offers computational simplicity and memory savings. ZO requires generating a substantial number of Gaussian random numbers, which poses significant difficulties and even makes it infeasible for hardware platforms, such as FPGAs and ASICs. We propose PeZO, a perturbation-efficient ZO framework that significantly reduces the demand for random number generation. Our experiments show that PeZO reduces the required LUTs and FFs for random number generation by 48.6% and 12.7%, and saves at maximum 86% power consumption
arXiv Detail & Related papers (2025-04-28T23:58:07Z)
Fast Jet Tagging with MLP-Mixers on FPGAs [1.5087357654868088]
We explore the innovative use of Distributed-Mixer models for real-time jet tagging and establish their feasibility on resource-constrained hardware like FPGAs.<n>We achieve state-of-the-art performance on datasets mimicking Large Hadron Collider conditions.<n>Non-permutation-invariant architectures enable smart feature prioritization and efficient FPGA deployment, setting a new benchmark for machine learning in real-time data processing at particle colliders.
arXiv Detail & Related papers (2025-03-05T01:37:47Z)
Vector Optimization with Gaussian Process Bandits [7.049738935364297]
Learning problems in which multiple objectives must be considered simultaneously often arise in various fields, including engineering, drug design, and environmental management.<n>Traditional methods for dealing with multiple black-box objective functions have limitations in incorporating objective preferences and exploring the solution space accordingly.<n>We propose Vector Optimization with Gaussian Process (VOGP), a probably approximately correct adaptive elimination algorithm that performs black-box vector optimization using Gaussian process bandits.
arXiv Detail & Related papers (2024-12-03T14:47:46Z)
Enhancing Dropout-based Bayesian Neural Networks with Multi-Exit on FPGA [20.629635991749808]
This paper proposes an algorithm and hardware co-design framework that can generate field-programmable gate array (FPGA)-based accelerators for efficient BayesNNs. At the algorithm level, we propose novel multi-exit dropout-based BayesNNs with reduced computational and memory overheads. At the hardware level, this paper introduces a transformation framework that can generate FPGA-based accelerators for the proposed efficient BayesNNs.
arXiv Detail & Related papers (2024-06-20T17:08:42Z)
Federated Conditional Stochastic Optimization [110.513884892319]
Conditional optimization has found in a wide range of machine learning tasks, such as in-variant learning tasks, AUPRC, andAML. This paper proposes algorithms for distributed federated learning.
arXiv Detail & Related papers (2023-10-04T01:47:37Z)
AxOMaP: Designing FPGA-based Approximate Arithmetic Operators using Mathematical Programming [2.898055875927704]
We propose a data analysis-driven mathematical programming-based approach to synthesizing approximate operators for FPGAs. Specifically, we formulate mixed integer quadratically constrained programs based on the results of correlation analysis of the characterization data. Compared to traditional evolutionary algorithms-based optimization, we report up to 21% improvement in the hypervolume, for joint optimization of PPA and BEHAV.
arXiv Detail & Related papers (2023-09-23T18:23:54Z)
Energy-efficient Task Adaptation for NLP Edge Inference Leveraging Heterogeneous Memory Architectures [68.91874045918112]
adapter-ALBERT is an efficient model optimization for maximal data reuse across different tasks. We demonstrate the advantage of mapping the model to a heterogeneous on-chip memory architecture by performing simulations on a validated NLP edge accelerator.
arXiv Detail & Related papers (2023-03-25T14:40:59Z)
Performance Embeddings: A Similarity-based Approach to Automatic Performance Optimization [71.69092462147292]
Performance embeddings enable knowledge transfer of performance tuning between applications. We demonstrate this transfer tuning approach on case studies in deep neural networks, dense and sparse linear algebra compositions, and numerical weather prediction stencils.
arXiv Detail & Related papers (2023-03-14T15:51:35Z)
Towards Optimal VPU Compiler Cost Modeling by using Neural Networks to Infer Hardware Performances [58.720142291102135]
'VPUNN' is a neural network-based cost model trained on low-level task profiling. It consistently outperforms the state-of-the-art cost modeling in Intel's line of VPU processors.
arXiv Detail & Related papers (2022-05-09T22:48:39Z)
A Graph Deep Learning Framework for High-Level Synthesis Design Space Exploration [11.154086943903696]
High-Level Synthesis is a solution for fast prototyping application-specific hardware. We propose HLS, for the first time in the literature, graph neural networks that jointly predict acceleration performance and hardware costs. We show that our approach achieves prediction accuracy comparable with that of commonly used simulators.
arXiv Detail & Related papers (2021-11-29T18:17:45Z)
HALF: Holistic Auto Machine Learning for FPGAs [1.9146960682777232]
Deep Neural Networks (DNNs) are capable of solving complex problems in domains related to embedded systems, such as image and natural language processing. To efficiently implement DNNs on a specific FPGA platform for a given cost criterion, e.g. energy efficiency, an enormous amount of design parameters has to be considered. An automatic, holistic design approach can improve the quality of DNN implementations on FPGA significantly.
arXiv Detail & Related papers (2021-06-28T14:45:47Z)
Adaptive pruning-based optimization of parameterized quantum circuits [62.997667081978825]
Variisy hybrid quantum-classical algorithms are powerful tools to maximize the use of Noisy Intermediate Scale Quantum devices. We propose a strategy for such ansatze used in variational quantum algorithms, which we call "Efficient Circuit Training" (PECT) Instead of optimizing all of the ansatz parameters at once, PECT launches a sequence of variational algorithms.
arXiv Detail & Related papers (2020-10-01T18:14:11Z)

This list is automatically generated from the titles and abstracts of the papers in this site.