CORE: Constraint-Aware One-Step Reinforcement Learning for Simulation-Guided Neural Network Accelerator Design
- URL: http://arxiv.org/abs/2506.03474v1
- Date: Wed, 04 Jun 2025 01:08:34 GMT
- Title: CORE: Constraint-Aware One-Step Reinforcement Learning for Simulation-Guided Neural Network Accelerator Design
- Authors: Yifeng Xiao, Yurong Xu, Ning Yan, Masood Mortazavi, Pierluigi Nuzzo,
- Abstract summary: CORE is a constraint-aware, one-step reinforcement learning method for simulationguided DSE.<n>We instantiate CORE for hardware-mapping co-design of neural network accelerators.
- Score: 3.549422886703227
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Simulation-based design space exploration (DSE) aims to efficiently optimize high-dimensional structured designs under complex constraints and expensive evaluation costs. Existing approaches, including heuristic and multi-step reinforcement learning (RL) methods, struggle to balance sampling efficiency and constraint satisfaction due to sparse, delayed feedback, and large hybrid action spaces. In this paper, we introduce CORE, a constraint-aware, one-step RL method for simulationguided DSE. In CORE, the policy agent learns to sample design configurations by defining a structured distribution over them, incorporating dependencies via a scaling-graph-based decoder, and by reward shaping to penalize invalid designs based on the feedback obtained from simulation. CORE updates the policy using a surrogate objective that compares the rewards of designs within a sampled batch, without learning a value function. This critic-free formulation enables efficient learning by encouraging the selection of higher-reward designs. We instantiate CORE for hardware-mapping co-design of neural network accelerators, demonstrating that it significantly improves sample efficiency and achieves better accelerator configurations compared to state-of-the-art baselines. Our approach is general and applicable to a broad class of discrete-continuous constrained design problems.
Related papers
- Preference Optimization for Combinatorial Optimization Problems [54.87466279363487]
Reinforcement Learning (RL) has emerged as a powerful tool for neural optimization, enabling models learns that solve complex problems without requiring expert knowledge.<n>Despite significant progress, existing RL approaches face challenges such as diminishing reward signals and inefficient exploration in vast action spaces.<n>We propose Preference Optimization, a novel method that transforms quantitative reward signals into qualitative preference signals via statistical comparison modeling.
arXiv Detail & Related papers (2025-05-13T16:47:00Z) - Efficient Stimuli Generation using Reinforcement Learning in Design Verification [2.9652396326501864]
Reinforcement Learning (RL) is proposed to generate efficient stimuli with the help of Reinforcement Learning (RL) to reach the maximum code coverage of the Design Under Verification (DUV)
In this paper, a novel methodology is proposed to generate efficient stimuli with the help of Reinforcement Learning (RL) to reach the maximum code coverage of the Design Under Verification (DUV)
arXiv Detail & Related papers (2024-05-30T08:23:04Z) - Entropy-Regularized Token-Level Policy Optimization for Language Agent Reinforcement [67.1393112206885]
Large Language Models (LLMs) have shown promise as intelligent agents in interactive decision-making tasks.
We introduce Entropy-Regularized Token-level Policy Optimization (ETPO), an entropy-augmented RL method tailored for optimizing LLMs at the token level.
We assess the effectiveness of ETPO within a simulated environment that models data science code generation as a series of multi-step interactive tasks.
arXiv Detail & Related papers (2024-02-09T07:45:26Z) - Diffusion Generative Inverse Design [28.04683283070957]
Inverse design refers to the problem of optimizing the input of an objective function in order to enact a target outcome.
Recent developments in learned graph neural networks (GNNs) can be used for accurate, efficient, differentiable estimation of simulator dynamics.
We show how denoising diffusion diffusion models can be used to solve inverse design problems efficiently and propose a particle sampling algorithm for further improving their efficiency.
arXiv Detail & Related papers (2023-09-05T08:32:07Z) - End-to-End Meta-Bayesian Optimisation with Transformer Neural Processes [52.818579746354665]
This paper proposes the first end-to-end differentiable meta-BO framework that generalises neural processes to learn acquisition functions via transformer architectures.
We enable this end-to-end framework with reinforcement learning (RL) to tackle the lack of labelled acquisition data.
arXiv Detail & Related papers (2023-05-25T10:58:46Z) - Iterative Soft Shrinkage Learning for Efficient Image Super-Resolution [91.3781512926942]
Image super-resolution (SR) has witnessed extensive neural network designs from CNN to transformer architectures.
This work investigates the potential of network pruning for super-resolution iteration to take advantage of off-the-shelf network designs and reduce the underlying computational overhead.
We propose a novel Iterative Soft Shrinkage-Percentage (ISS-P) method by optimizing the sparse structure of a randomly network at each and tweaking unimportant weights with a small amount proportional to the magnitude scale on-the-fly.
arXiv Detail & Related papers (2023-03-16T21:06:13Z) - MARLIN: Soft Actor-Critic based Reinforcement Learning for Congestion
Control in Real Networks [63.24965775030673]
We propose a novel Reinforcement Learning (RL) approach to design generic Congestion Control (CC) algorithms.
Our solution, MARLIN, uses the Soft Actor-Critic algorithm to maximize both entropy and return.
We trained MARLIN on a real network with varying background traffic patterns to overcome the sim-to-real mismatch.
arXiv Detail & Related papers (2023-02-02T18:27:20Z) - Unifying Synergies between Self-supervised Learning and Dynamic
Computation [53.66628188936682]
We present a novel perspective on the interplay between SSL and DC paradigms.
We show that it is feasible to simultaneously learn a dense and gated sub-network from scratch in a SSL setting.
The co-evolution during pre-training of both dense and gated encoder offers a good accuracy-efficiency trade-off.
arXiv Detail & Related papers (2023-01-22T17:12:58Z) - When to Update Your Model: Constrained Model-based Reinforcement
Learning [50.74369835934703]
We propose a novel and general theoretical scheme for a non-decreasing performance guarantee of model-based RL (MBRL)
Our follow-up derived bounds reveal the relationship between model shifts and performance improvement.
A further example demonstrates that learning models from a dynamically-varying number of explorations benefit the eventual returns.
arXiv Detail & Related papers (2022-10-15T17:57:43Z) - A Graph Deep Learning Framework for High-Level Synthesis Design Space
Exploration [11.154086943903696]
High-Level Synthesis is a solution for fast prototyping application-specific hardware.
We propose HLS, for the first time in the literature, graph neural networks that jointly predict acceleration performance and hardware costs.
We show that our approach achieves prediction accuracy comparable with that of commonly used simulators.
arXiv Detail & Related papers (2021-11-29T18:17:45Z) - Robust Topology Optimization Using Multi-Fidelity Variational Autoencoders [1.0124625066746595]
A robust topology optimization (RTO) problem identifies a design with the best average performance.
A neural network method is proposed that offers computational efficiency.
Numerical application of the method is shown on the robust design of L-bracket structure with single point load as well as multiple point loads.
arXiv Detail & Related papers (2021-07-19T20:40:51Z) - Analog Circuit Design with Dyna-Style Reinforcement Learning [12.232323973906773]
We present a learning based approach to analog circuit design, where the goal is to optimize circuit performance subject to certain design constraints.
We propose a method with two key properties. First, it learns a reward model, i.e., surrogate model of the performance approximated by neural networks, to reduce the required number of simulation.
Second, it uses a policy generator to explore the diverse solution space satisfying constraints. The results show that, compared to the model-free method applied with 20,000 circuit simulations to train the policy, DynaOpt achieves even much better performance by learning from scratch with only 500 simulations.
arXiv Detail & Related papers (2020-11-16T00:19:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.