Pretty darn good control: when are approximate solutions better than
approximate models
- URL: http://arxiv.org/abs/2308.13654v1
- Date: Fri, 25 Aug 2023 19:58:17 GMT
- Title: Pretty darn good control: when are approximate solutions better than
approximate models
- Authors: Felipe Montealegre-Mora, Marcus Lapeyrolerie, Melissa Chapman, Abigail
G. Keller, Carl Boettiger
- Abstract summary: We show that DRL algorithms can successfully approximate solutions in a non-linear three-variable model for a fishery.
We show that the policy obtained with DRL is both more profitable and more sustainable than any constant mortality policy.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Existing methods for optimal control struggle to deal with the complexity
commonly encountered in real-world systems, including dimensionality, process
error, model bias and data heterogeneity. Instead of tackling these system
complexities directly, researchers have typically sought to simplify models to
fit optimal control methods. But when is the optimal solution to an
approximate, stylized model better than an approximate solution to a more
accurate model? While this question has largely gone unanswered owing to the
difficulty of finding even approximate solutions for complex models, recent
algorithmic and computational advances in deep reinforcement learning (DRL)
might finally allow us to address these questions. DRL methods have to date
been applied primarily in the context of games or robotic mechanics, which
operate under precisely known rules. Here, we demonstrate the ability for DRL
algorithms using deep neural networks to successfully approximate solutions
(the "policy function" or control rule) in a non-linear three-variable model
for a fishery without knowing or ever attempting to infer a model for the
process itself. We find that the reinforcement learning agent discovers an
effective simplification of the problem to obtain an interpretable control
rule. We show that the policy obtained with DRL is both more profitable and
more sustainable than any constant mortality policy -- the standard family of
policies considered in fishery management.
Related papers
- Learning RL-Policies for Joint Beamforming Without Exploration: A Batch
Constrained Off-Policy Approach [1.0080317855851213]
We consider the problem of network parameter cancellation optimization for networks.
We show that deploying an algorithm in the real world for exploration and learning can be achieved with the data without exploring.
arXiv Detail & Related papers (2023-10-12T18:36:36Z) - Robust Reinforcement Learning using Offline Data [23.260211453437055]
We propose a robust reinforcement learning algorithm called Robust Fitted Q-Iteration (RFQI)
RFQI uses only an offline dataset to learn the optimal robust policy.
We prove that RFQI learns a near-optimal robust policy under standard assumptions.
arXiv Detail & Related papers (2022-08-10T03:47:45Z) - Verifying Learning-Based Robotic Navigation Systems [61.01217374879221]
We show how modern verification engines can be used for effective model selection.
Specifically, we use verification to detect and rule out policies that may demonstrate suboptimal behavior.
Our work is the first to demonstrate the use of verification backends for recognizing suboptimal DRL policies in real-world robots.
arXiv Detail & Related papers (2022-05-26T17:56:43Z) - An Experimental Design Perspective on Model-Based Reinforcement Learning [73.37942845983417]
In practical applications of RL, it is expensive to observe state transitions from the environment.
We propose an acquisition function that quantifies how much information a state-action pair would provide about the optimal solution to a Markov decision process.
arXiv Detail & Related papers (2021-12-09T23:13:57Z) - Robust Predictable Control [149.71263296079388]
We show that our method achieves much tighter compression than prior methods, achieving up to 5x higher reward than a standard information bottleneck.
We also demonstrate that our method learns policies that are more robust and generalize better to new tasks.
arXiv Detail & Related papers (2021-09-07T17:29:34Z) - PC-MLP: Model-based Reinforcement Learning with Policy Cover Guided
Exploration [15.173628100049129]
This work studies a model-based algorithm for both Kernelized Regulators (KNR) and linear Markov Decision Processes (MDPs)
For both models, our algorithm guarantees sample complexity and only uses access to a planning oracle.
Our method can also perform reward-free exploration efficiently.
arXiv Detail & Related papers (2021-07-15T15:49:30Z) - Centralized Model and Exploration Policy for Multi-Agent RL [13.661446184763117]
Reinforcement learning in partially observable, fully cooperative multi-agent settings (Dec-POMDPs) can be used to address many real-world challenges.
Current RL algorithms for Dec-POMDPs suffer from poor sample complexity.
We propose a model-based algorithm, MARCO, in three cooperative communication tasks, where it improves sample efficiency by up to 20x.
arXiv Detail & Related papers (2021-07-14T00:34:08Z) - Efficient Model-Based Reinforcement Learning through Optimistic Policy
Search and Planning [93.1435980666675]
We show how optimistic exploration can be easily combined with state-of-the-art reinforcement learning algorithms.
Our experiments demonstrate that optimistic exploration significantly speeds-up learning when there are penalties on actions.
arXiv Detail & Related papers (2020-06-15T18:37:38Z) - MOPO: Model-based Offline Policy Optimization [183.6449600580806]
offline reinforcement learning (RL) refers to the problem of learning policies entirely from a large batch of previously collected data.
We show that an existing model-based RL algorithm already produces significant gains in the offline setting.
We propose to modify the existing model-based RL methods by applying them with rewards artificially penalized by the uncertainty of the dynamics.
arXiv Detail & Related papers (2020-05-27T08:46:41Z) - Information Theoretic Model Predictive Q-Learning [64.74041985237105]
We present a novel theoretical connection between information theoretic MPC and entropy regularized RL.
We develop a Q-learning algorithm that can leverage biased models.
arXiv Detail & Related papers (2019-12-31T00:29:22Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.