Reinforcement Learning with an Abrupt Model Change
- URL: http://arxiv.org/abs/2304.11460v1
- Date: Sat, 22 Apr 2023 18:16:01 GMT
- Title: Reinforcement Learning with an Abrupt Model Change
- Authors: Wuxia Chen, Taposh Banerjee, Jemin George, and Carl Busart
- Abstract summary: The problem of reinforcement learning is considered where the environment or the model undergoes a change.
An algorithm is proposed that an agent can apply in such a problem to achieve the optimal long-time discounted reward.
The algorithm is model-free and learns the optimal policy by interacting with the environment.
- Score: 15.101940747707705
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The problem of reinforcement learning is considered where the environment or
the model undergoes a change. An algorithm is proposed that an agent can apply
in such a problem to achieve the optimal long-time discounted reward. The
algorithm is model-free and learns the optimal policy by interacting with the
environment. It is shown that the proposed algorithm has strong optimality
properties. The effectiveness of the algorithm is also demonstrated using
simulation results. The proposed algorithm exploits a fundamental
reward-detection trade-off present in these problems and uses a quickest change
detection algorithm to detect the model change. Recommendations are provided
for faster detection of model changes and for smart initialization strategies.
Related papers
- Deep Reinforcement Learning for Dynamic Algorithm Selection: A
Proof-of-Principle Study on Differential Evolution [27.607740475924448]
We propose a deep reinforcement learning-based dynamic algorithm selection framework to accomplish this task.
We employ a sophisticated deep neural network model to infer the optimal action, ensuring informed algorithm selections.
As a proof-of-principle study, we apply this framework to a group of Differential Evolution algorithms.
arXiv Detail & Related papers (2024-03-04T15:40:28Z) - Frog-Snake prey-predation Relationship Optimization (FSRO) : A novel nature-inspired metaheuristic algorithm for feature selection [0.0]
This study proposes the Frog-Snake prey-predation Relationship Optimization (FSRO) algorithm.
It is inspired by the prey-predation relationship between frogs and snakes for application to discrete optimization problems.
The proposed algorithm conducts computational experiments on feature selection using 26 types of machine learning datasets.
arXiv Detail & Related papers (2024-02-13T06:39:15Z) - Efficient Training of Physics-Informed Neural Networks with Direct Grid
Refinement Algorithm [0.0]
This research presents the development of an innovative algorithm tailored for the adaptive sampling of residual points within the framework of Physics-Informed Neural Networks (PINNs)
By addressing the limitations inherent in existing adaptive sampling techniques, our proposed methodology introduces a direct mesh refinement approach that effectively ensures both computational efficiency and adaptive point placement.
arXiv Detail & Related papers (2023-06-14T07:04:02Z) - Quickest Change Detection for Unnormalized Statistical Models [36.6516991850508]
This paper develops a new variant of the classical Cumulative Sum (CUSUM) algorithm for the quickest change detection.
The SCUSUM algorithm allows the applications of change detection for unnormalized statistical models.
arXiv Detail & Related papers (2023-02-01T05:27:34Z) - Socio-cognitive Optimization of Time-delay Control Problems using
Evolutionary Metaheuristics [89.24951036534168]
Metaheuristics are universal optimization algorithms which should be used for solving difficult problems, unsolvable by classic approaches.
In this paper we aim at constructing novel socio-cognitive metaheuristic based on castes, and apply several versions of this algorithm to optimization of time-delay system model.
arXiv Detail & Related papers (2022-10-23T22:21:10Z) - When to Update Your Model: Constrained Model-based Reinforcement
Learning [50.74369835934703]
We propose a novel and general theoretical scheme for a non-decreasing performance guarantee of model-based RL (MBRL)
Our follow-up derived bounds reveal the relationship between model shifts and performance improvement.
A further example demonstrates that learning models from a dynamically-varying number of explorations benefit the eventual returns.
arXiv Detail & Related papers (2022-10-15T17:57:43Z) - Improved Algorithms for Neural Active Learning [74.89097665112621]
We improve the theoretical and empirical performance of neural-network(NN)-based active learning algorithms for the non-parametric streaming setting.
We introduce two regret metrics by minimizing the population loss that are more suitable in active learning than the one used in state-of-the-art (SOTA) related work.
arXiv Detail & Related papers (2022-10-02T05:03:38Z) - High-dimensional Bayesian Optimization Algorithm with Recurrent Neural
Network for Disease Control Models in Time Series [1.9371782627708491]
We propose a new high dimensional Bayesian Optimization algorithm combining Recurrent neural networks.
The proposed RNN-BO algorithm can solve the optimal control problems in the lower dimension space.
We also discuss the impacts of different numbers of the RNN layers and training epochs on the trade-off between solution quality and related computational efforts.
arXiv Detail & Related papers (2022-01-01T08:40:17Z) - Variance-Reduced Off-Policy Memory-Efficient Policy Search [61.23789485979057]
Off-policy policy optimization is a challenging problem in reinforcement learning.
Off-policy algorithms are memory-efficient and capable of learning from off-policy samples.
arXiv Detail & Related papers (2020-09-14T16:22:46Z) - Model-Augmented Actor-Critic: Backpropagating through Paths [81.86992776864729]
Current model-based reinforcement learning approaches use the model simply as a learned black-box simulator.
We show how to make more effective use of the model by exploiting its differentiability.
arXiv Detail & Related papers (2020-05-16T19:18:10Z) - Active Model Estimation in Markov Decision Processes [108.46146218973189]
We study the problem of efficient exploration in order to learn an accurate model of an environment, modeled as a Markov decision process (MDP)
We show that our Markov-based algorithm outperforms both our original algorithm and the maximum entropy algorithm in the small sample regime.
arXiv Detail & Related papers (2020-03-06T16:17:24Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.