Assessment of Reinforcement Learning Algorithms for Nuclear Power Plant
Fuel Optimization
- URL: http://arxiv.org/abs/2305.05812v2
- Date: Mon, 17 Jul 2023 13:56:40 GMT
- Title: Assessment of Reinforcement Learning Algorithms for Nuclear Power Plant
Fuel Optimization
- Authors: Paul Seurin, Koroush Shirvan
- Abstract summary: This work presents a first-of-a-kind approach to utilize deep RL to solve the loading pattern problem and could be leveraged for any engineering design optimization.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The nuclear fuel loading pattern optimization problem belongs to the class of
large-scale combinatorial optimization. It is also characterized by multiple
objectives and constraints, which makes it impossible to solve explicitly.
Stochastic optimization methodologies including Genetic Algorithms and
Simulated Annealing are used by different nuclear utilities and vendors, but
hand-designed solutions continue to be the prevalent method in the industry. To
improve the state-of-the-art, Deep Reinforcement Learning (RL), in particular,
Proximal Policy Optimization is leveraged. This work presents a first-of-a-kind
approach to utilize deep RL to solve the loading pattern problem and could be
leveraged for any engineering design optimization. This paper is also to our
knowledge the first to propose a study of the behavior of several
hyper-parameters that influence the RL algorithm. The algorithm is highly
dependent on multiple factors such as the shape of the objective function
derived for the core design that behaves as a fudge factor that affects the
stability of the learning. But also, an exploration/exploitation trade-off that
manifests through different parameters such as the number of loading patterns
seen by the agents per episode, the number of samples collected before a policy
update nsteps, and an entropy factor ent_coef that increases the randomness of
the policy during training. We found that RL must be applied similarly to a
Gaussian Process in which the acquisition function is replaced by a
parametrized policy. Then, once an initial set of hyper-parameters is found,
reducing nsteps and ent_coef until no more learning is observed will result in
the highest sample efficiency robustly and stably. This resulted in an economic
benefit of 535,000- 642,000 $/year/plant.
Related papers
- Stochastic Q-learning for Large Discrete Action Spaces [79.1700188160944]
In complex environments with discrete action spaces, effective decision-making is critical in reinforcement learning (RL)
We present value-based RL approaches which, as opposed to optimizing over the entire set of $n$ actions, only consider a variable set of actions, possibly as small as $mathcalO(log(n)$)$.
The presented value-based RL methods include, among others, Q-learning, StochDQN, StochDDQN, all of which integrate this approach for both value-function updates and action selection.
arXiv Detail & Related papers (2024-05-16T17:58:44Z) - Surpassing legacy approaches to PWR core reload optimization with single-objective Reinforcement learning [0.0]
We have developed methods based on Deep Reinforcement Learning (DRL) for both single- and multi-objective optimization.
In this paper, we demonstrate the advantage of our RL-based approach, specifically using Proximal Policy Optimization (PPO)
PPO adapts its search capability via a policy with learnable weights, allowing it to function as both a global and local search method.
arXiv Detail & Related papers (2024-02-16T19:35:58Z) - Reparameterized Policy Learning for Multimodal Trajectory Optimization [61.13228961771765]
We investigate the challenge of parametrizing policies for reinforcement learning in high-dimensional continuous action spaces.
We propose a principled framework that models the continuous RL policy as a generative model of optimal trajectories.
We present a practical model-based RL method, which leverages the multimodal policy parameterization and learned world model.
arXiv Detail & Related papers (2023-07-20T09:05:46Z) - Model-based Safe Deep Reinforcement Learning via a Constrained Proximal
Policy Optimization Algorithm [4.128216503196621]
We propose an On-policy Model-based Safe Deep RL algorithm in which we learn the transition dynamics of the environment in an online manner.
We show that our algorithm is more sample efficient and results in lower cumulative hazard violations as compared to constrained model-free approaches.
arXiv Detail & Related papers (2022-10-14T06:53:02Z) - Generalizing Bayesian Optimization with Decision-theoretic Entropies [102.82152945324381]
We consider a generalization of Shannon entropy from work in statistical decision theory.
We first show that special cases of this entropy lead to popular acquisition functions used in BO procedures.
We then show how alternative choices for the loss yield a flexible family of acquisition functions.
arXiv Detail & Related papers (2022-10-04T04:43:58Z) - Delayed Geometric Discounts: An Alternative Criterion for Reinforcement
Learning [1.52292571922932]
reinforcement learning (RL) proposes a theoretical background to learn optimal behaviors.
In practice, RL algorithms rely on geometric discounts to evaluate this optimality.
In this paper, we tackle these issues by generalizing the discounted problem formulation with a family of delayed objective functions.
arXiv Detail & Related papers (2022-09-26T07:49:38Z) - Human-in-the-loop: Provably Efficient Preference-based Reinforcement
Learning with General Function Approximation [107.54516740713969]
We study human-in-the-loop reinforcement learning (RL) with trajectory preferences.
Instead of receiving a numeric reward at each step, the agent only receives preferences over trajectory pairs from a human overseer.
We propose the first optimistic model-based algorithm for PbRL with general function approximation.
arXiv Detail & Related papers (2022-05-23T09:03:24Z) - Automatic tuning of hyper-parameters of reinforcement learning
algorithms using Bayesian optimization with behavioral cloning [0.0]
In reinforcement learning (RL), the information content of data gathered by the learning agent is dependent on the setting of many hyper- parameters.
In this work, a novel approach for autonomous hyper- parameter setting using Bayesian optimization is proposed.
Experiments reveal promising results compared to other manual tweaking and optimization-based approaches.
arXiv Detail & Related papers (2021-12-15T13:10:44Z) - Local policy search with Bayesian optimization [73.0364959221845]
Reinforcement learning aims to find an optimal policy by interaction with an environment.
Policy gradients for local search are often obtained from random perturbations.
We develop an algorithm utilizing a probabilistic model of the objective function and its gradient.
arXiv Detail & Related papers (2021-06-22T16:07:02Z) - Learning Sampling Policy for Faster Derivative Free Optimization [100.27518340593284]
We propose a new reinforcement learning based ZO algorithm (ZO-RL) with learning the sampling policy for generating the perturbations in ZO optimization instead of using random sampling.
Our results show that our ZO-RL algorithm can effectively reduce the variances of ZO gradient by learning a sampling policy, and converge faster than existing ZO algorithms in different scenarios.
arXiv Detail & Related papers (2021-04-09T14:50:59Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.