Entropy Regularized Reinforcement Learning Using Large Deviation Theory
- URL: http://arxiv.org/abs/2106.03931v2
- Date: Mon, 10 Apr 2023 20:22:41 GMT
- Title: Entropy Regularized Reinforcement Learning Using Large Deviation Theory
- Authors: Argenis Arriojas, Jacob Adamczyk, Stas Tiomkin and Rahul V. Kulkarni
- Abstract summary: In this paper, we establish a mapping between entropy-regularized RL and research in non-equilibrium statistical mechanics.
We apply approaches from large deviation theory to derive exact analytical results for the optimal policy and optimal dynamics.
The results lead to a novel analytical and computational framework for entropy-regularized RL which is validated by simulations.
- Score: 3.058685580689605
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: Reinforcement learning (RL) is an important field of research in machine
learning that is increasingly being applied to complex optimization problems in
physics. In parallel, concepts from physics have contributed to important
advances in RL with developments such as entropy-regularized RL. While these
developments have led to advances in both fields, obtaining analytical
solutions for optimization in entropy-regularized RL is currently an open
problem. In this paper, we establish a mapping between entropy-regularized RL
and research in non-equilibrium statistical mechanics focusing on Markovian
processes conditioned on rare events. In the long-time limit, we apply
approaches from large deviation theory to derive exact analytical results for
the optimal policy and optimal dynamics in Markov Decision Process (MDP) models
of reinforcement learning. The results obtained lead to a novel analytical and
computational framework for entropy-regularized RL which is validated by
simulations. The mapping established in this work connects current research in
reinforcement learning and non-equilibrium statistical mechanics, thereby
opening new avenues for the application of analytical and computational
approaches from one field to cutting-edge problems in the other.
Related papers
- Minimax Optimal and Computationally Efficient Algorithms for Distributionally Robust Offline Reinforcement Learning [6.969949986864736]
Distributionally robust offline reinforcement learning (RL) seeks robust policy training against environment perturbation by modeling dynamics uncertainty.
We propose minimax optimal and computationally efficient algorithms realizing function approximation.
Our results uncover that function approximation in robust offline RL is essentially distinct from and probably harder than that in standard offline RL.
arXiv Detail & Related papers (2024-03-14T17:55:10Z) - Provably Efficient Partially Observable Risk-Sensitive Reinforcement
Learning with Hindsight Observation [35.278669159850146]
We introduce a novel formulation that integrates hindsight observations into a Partially Observable Decision Process (POMDP) framework.
We develop the first provably efficient RL algorithm tailored for this setting.
These techniques are of particular interest to the theoretical study of reinforcement learning.
arXiv Detail & Related papers (2024-02-28T08:24:06Z) - Deep Generative Symbolic Regression [83.04219479605801]
Symbolic regression aims to discover concise closed-form mathematical equations from data.
Existing methods, ranging from search to reinforcement learning, fail to scale with the number of input variables.
We propose an instantiation of our framework, Deep Generative Symbolic Regression.
arXiv Detail & Related papers (2023-12-30T17:05:31Z) - Leveraging Hamilton-Jacobi PDEs with time-dependent Hamiltonians for continual scientific machine learning [1.8175282137722093]
We address two major challenges in scientific machine learning (SciML)
We establish a new theoretical connection between optimization problems arising from SciML and a generalized Hopf formula.
Existing HJ PDE solvers and optimal control algorithms can be reused to design new efficient training approaches.
arXiv Detail & Related papers (2023-11-13T22:55:56Z) - Reinforcement Learning-assisted Evolutionary Algorithm: A Survey and
Research Opportunities [63.258517066104446]
Reinforcement learning integrated as a component in the evolutionary algorithm has demonstrated superior performance in recent years.
We discuss the RL-EA integration method, the RL-assisted strategy adopted by RL-EA, and its applications according to the existing literature.
In the applications of RL-EA section, we also demonstrate the excellent performance of RL-EA on several benchmarks and a range of public datasets.
arXiv Detail & Related papers (2023-08-25T15:06:05Z) - Variance Control for Distributional Reinforcement Learning [22.407803118899512]
We construct a new estimator emphQuantiled Expansion Mean (QEM) and introduce a new DRL algorithm (QEMRL) from the statistical perspective.
We extensively evaluate our QEMRL algorithm on a variety of Atari and Mujoco benchmark tasks.
arXiv Detail & Related papers (2023-07-30T07:25:18Z) - Provable Reward-Agnostic Preference-Based Reinforcement Learning [61.39541986848391]
Preference-based Reinforcement Learning (PbRL) is a paradigm in which an RL agent learns to optimize a task using pair-wise preference-based feedback over trajectories.
We propose a theoretical reward-agnostic PbRL framework where exploratory trajectories that enable accurate learning of hidden reward functions are acquired.
arXiv Detail & Related papers (2023-05-29T15:00:09Z) - Assessment of Reinforcement Learning Algorithms for Nuclear Power Plant
Fuel Optimization [0.0]
This work presents a first-of-a-kind approach to utilize deep RL to solve the loading pattern problem and could be leveraged for any engineering design optimization.
arXiv Detail & Related papers (2023-05-09T23:51:24Z) - When to Update Your Model: Constrained Model-based Reinforcement
Learning [50.74369835934703]
We propose a novel and general theoretical scheme for a non-decreasing performance guarantee of model-based RL (MBRL)
Our follow-up derived bounds reveal the relationship between model shifts and performance improvement.
A further example demonstrates that learning models from a dynamically-varying number of explorations benefit the eventual returns.
arXiv Detail & Related papers (2022-10-15T17:57:43Z) - Reinforcement Learning for Adaptive Mesh Refinement [63.7867809197671]
We propose a novel formulation of AMR as a Markov decision process and apply deep reinforcement learning to train refinement policies directly from simulation.
The model sizes of these policy architectures are independent of the mesh size and hence scale to arbitrarily large and complex simulations.
arXiv Detail & Related papers (2021-03-01T22:55:48Z) - Dynamics Generalization via Information Bottleneck in Deep Reinforcement
Learning [90.93035276307239]
We propose an information theoretic regularization objective and an annealing-based optimization method to achieve better generalization ability in RL agents.
We demonstrate the extreme generalization benefits of our approach in different domains ranging from maze navigation to robotic tasks.
This work provides a principled way to improve generalization in RL by gradually removing information that is redundant for task-solving.
arXiv Detail & Related papers (2020-08-03T02:24:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.