A Robust Policy Bootstrapping Algorithm for Multi-objective
Reinforcement Learning in Non-stationary Environments
- URL: http://arxiv.org/abs/2308.09734v1
- Date: Fri, 18 Aug 2023 02:15:12 GMT
- Title: A Robust Policy Bootstrapping Algorithm for Multi-objective
Reinforcement Learning in Non-stationary Environments
- Authors: Sherif Abdelfattah, Kathryn Kasmarik, Jiankun Hu
- Abstract summary: Multi-objective reinforcement learning methods fuse the reinforcement learning paradigm with multi-objective optimization techniques.
One major drawback of these methods is the lack of adaptability to non-stationary dynamics in the environment.
We propose a novel multi-objective reinforcement learning algorithm that can robustly evolve a convex coverage set of policies in an online manner in non-stationary environments.
- Score: 15.794728813746397
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Multi-objective Markov decision processes are a special kind of
multi-objective optimization problem that involves sequential decision making
while satisfying the Markov property of stochastic processes. Multi-objective
reinforcement learning methods address this problem by fusing the reinforcement
learning paradigm with multi-objective optimization techniques. One major
drawback of these methods is the lack of adaptability to non-stationary
dynamics in the environment. This is because they adopt optimization procedures
that assume stationarity to evolve a coverage set of policies that can solve
the problem. This paper introduces a developmental optimization approach that
can evolve the policy coverage set while exploring the preference space over
the defined objectives in an online manner. We propose a novel multi-objective
reinforcement learning algorithm that can robustly evolve a convex coverage set
of policies in an online manner in non-stationary environments. We compare the
proposed algorithm with two state-of-the-art multi-objective reinforcement
learning algorithms in stationary and non-stationary environments. Results
showed that the proposed algorithm significantly outperforms the existing
algorithms in non-stationary environments while achieving comparable results in
stationary environments.
Related papers
- Distributionally Robust Model-based Reinforcement Learning with Large
State Spaces [55.14361269378122]
Three major challenges in reinforcement learning are the complex dynamical systems with large state spaces, the costly data acquisition processes, and the deviation of real-world dynamics from the training environment deployment.
We study distributionally robust Markov decision processes with continuous state spaces under the widely used Kullback-Leibler, chi-square, and total variation uncertainty sets.
We propose a model-based approach that utilizes Gaussian Processes and the maximum variance reduction algorithm to efficiently learn multi-output nominal transition dynamics.
arXiv Detail & Related papers (2023-09-05T13:42:11Z) - Intrinsically Motivated Hierarchical Policy Learning in Multi-objective
Markov Decision Processes [15.50007257943931]
We propose a novel dual-phase intrinsically motivated reinforcement learning method to address this limitation.
We show experimentally that the proposed method significantly outperforms state-of-the-art multi-objective reinforcement methods in a dynamic robotics environment.
arXiv Detail & Related papers (2023-08-18T02:10:45Z) - Acceleration in Policy Optimization [50.323182853069184]
We work towards a unifying paradigm for accelerating policy optimization methods in reinforcement learning (RL) by integrating foresight in the policy improvement step via optimistic and adaptive updates.
We define optimism as predictive modelling of the future behavior of a policy, and adaptivity as taking immediate and anticipatory corrective actions to mitigate errors from overshooting predictions or delayed responses to change.
We design an optimistic policy gradient algorithm, adaptive via meta-gradient learning, and empirically highlight several design choices pertaining to acceleration, in an illustrative task.
arXiv Detail & Related papers (2023-06-18T15:50:57Z) - Stepsize Learning for Policy Gradient Methods in Contextual Markov
Decision Processes [35.889129338603446]
Policy-based algorithms are among the most widely adopted techniques in model-free RL.
They tend to struggle when asked to accomplish a series of heterogeneous tasks.
We introduce a new formulation, known as meta-MDP, that can be used to solve any hyper parameter selection problem in RL.
arXiv Detail & Related papers (2023-06-13T12:58:12Z) - Reinforcement Learning Methods for Wordle: A POMDP/Adaptive Control
Approach [0.3093890460224435]
We address the solution of the popular Wordle puzzle, using new reinforcement learning methods.
For the Wordle puzzle, they yield on-line solution strategies that are very close to optimal at relatively modest computational cost.
arXiv Detail & Related papers (2022-11-15T03:46:41Z) - Multi-Objective Policy Gradients with Topological Constraints [108.10241442630289]
We present a new algorithm for a policy gradient in TMDPs by a simple extension of the proximal policy optimization (PPO) algorithm.
We demonstrate this on a real-world multiple-objective navigation problem with an arbitrary ordering of objectives both in simulation and on a real robot.
arXiv Detail & Related papers (2022-09-15T07:22:58Z) - Learning MDPs from Features: Predict-Then-Optimize for Sequential
Decision Problems by Reinforcement Learning [52.74071439183113]
We study the predict-then-optimize framework in the context of sequential decision problems (formulated as MDPs) solved via reinforcement learning.
Two significant computational challenges arise in applying decision-focused learning to MDPs.
arXiv Detail & Related papers (2021-06-06T23:53:31Z) - Variance-Reduced Off-Policy Memory-Efficient Policy Search [61.23789485979057]
Off-policy policy optimization is a challenging problem in reinforcement learning.
Off-policy algorithms are memory-efficient and capable of learning from off-policy samples.
arXiv Detail & Related papers (2020-09-14T16:22:46Z) - Jointly Learning Environments and Control Policies with Projected
Stochastic Gradient Ascent [3.118384520557952]
We introduce a deep reinforcement learning algorithm combining policy gradient methods with model-based optimization techniques to solve this problem.
In essence, our algorithm iteratively approximates the gradient of the expected return via Monte-Carlo sampling and automatic differentiation.
We show that DEPS performs at least as well or better in all three environments, consistently yielding solutions with higher returns in fewer iterations.
arXiv Detail & Related papers (2020-06-02T16:08:07Z) - GACEM: Generalized Autoregressive Cross Entropy Method for Multi-Modal
Black Box Constraint Satisfaction [69.94831587339539]
We present a modified Cross-Entropy Method (CEM) that uses a masked auto-regressive neural network for modeling uniform distributions over the solution space.
Our algorithm is able to express complicated solution spaces, thus allowing it to track a variety of different solution regions.
arXiv Detail & Related papers (2020-02-17T20:21:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.