Act as You Learn: Adaptive Decision-Making in Non-Stationary Markov
Decision Processes
- URL: http://arxiv.org/abs/2401.01841v3
- Date: Mon, 22 Jan 2024 03:43:34 GMT
- Title: Act as You Learn: Adaptive Decision-Making in Non-Stationary Markov
Decision Processes
- Authors: Baiting Luo, Yunuo Zhang, Abhishek Dubey, Ayan Mukhopadhyay
- Abstract summary: We present a search algorithm called textitAdaptive Monte Carlo Tree Search (ADA-MCTS)
We show that the agent can learn the updated dynamics of the environment over time and then act as it learns, i.e., if the agent is in a region of the state space about which it has updated knowledge, it can avoid being pessimistic.
- Score: 5.276882857467777
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: A fundamental (and largely open) challenge in sequential decision-making is
dealing with non-stationary environments, where exogenous environmental
conditions change over time. Such problems are traditionally modeled as
non-stationary Markov decision processes (NSMDP). However, existing approaches
for decision-making in NSMDPs have two major shortcomings: first, they assume
that the updated environmental dynamics at the current time are known (although
future dynamics can change); and second, planning is largely pessimistic, i.e.,
the agent acts ``safely'' to account for the non-stationary evolution of the
environment. We argue that both these assumptions are invalid in practice --
updated environmental conditions are rarely known, and as the agent interacts
with the environment, it can learn about the updated dynamics and avoid being
pessimistic, at least in states whose dynamics it is confident about. We
present a heuristic search algorithm called \textit{Adaptive Monte Carlo Tree
Search (ADA-MCTS)} that addresses these challenges. We show that the agent can
learn the updated dynamics of the environment over time and then act as it
learns, i.e., if the agent is in a region of the state space about which it has
updated knowledge, it can avoid being pessimistic. To quantify ``updated
knowledge,'' we disintegrate the aleatoric and epistemic uncertainty in the
agent's updated belief and show how the agent can use these estimates for
decision-making. We compare the proposed approach with the multiple
state-of-the-art approaches in decision-making across multiple well-established
open-source problems and empirically show that our approach is faster and
highly adaptive without sacrificing safety.
Related papers
- No Regrets: Investigating and Improving Regret Approximations for Curriculum Discovery [53.08822154199948]
Unsupervised Environment Design (UED) methods have gained recent attention as their adaptive curricula promise to enable agents to be robust to in- and out-of-distribution tasks.
This work investigates how existing UED methods select training environments, focusing on task prioritisation metrics.
We develop a method that directly trains on scenarios with high learnability.
arXiv Detail & Related papers (2024-08-27T14:31:54Z) - Certifiably Robust Policies for Uncertain Parametric Environments [57.2416302384766]
We propose a framework based on parametric Markov decision processes (MDPs) with unknown distributions over parameters.
We learn and analyse IMDPs for a set of unknown sample environments induced by parameters.
We show that our approach produces tight bounds on a policy's performance with high confidence.
arXiv Detail & Related papers (2024-08-06T10:48:15Z) - Uncertainty Quantification for Forward and Inverse Problems of PDEs via
Latent Global Evolution [110.99891169486366]
We propose a method that integrates efficient and precise uncertainty quantification into a deep learning-based surrogate model.
Our method endows deep learning-based surrogate models with robust and efficient uncertainty quantification capabilities for both forward and inverse problems.
Our method excels at propagating uncertainty over extended auto-regressive rollouts, making it suitable for scenarios involving long-term predictions.
arXiv Detail & Related papers (2024-02-13T11:22:59Z) - HAZARD Challenge: Embodied Decision Making in Dynamically Changing
Environments [93.94020724735199]
HAZARD consists of three unexpected disaster scenarios, including fire, flood, and wind.
This benchmark enables us to evaluate autonomous agents' decision-making capabilities across various pipelines.
arXiv Detail & Related papers (2024-01-23T18:59:43Z) - Decision Making in Non-Stationary Environments with Policy-Augmented
Search [9.000981144624507]
We introduce textitPolicy-Augmented Monte Carlo tree search (PA-MCTS)
It combines action-value estimates from an out-of-date policy with an online search using an up-to-date model of the environment.
We prove theoretical results showing conditions under which PA-MCTS selects the one-step optimal action and also bound the error accrued while following PA-MCTS as a policy.
arXiv Detail & Related papers (2024-01-06T11:51:50Z) - Risk-Sensitive and Robust Model-Based Reinforcement Learning and
Planning [2.627046865670577]
We will address both planning and reinforcement learning approaches to sequential decision-making.
In many real-world domains, it is impossible to construct a perfectly accurate model or simulator.
We make a number of contributions towards this goal, with a focus on model-based algorithms.
arXiv Detail & Related papers (2023-04-02T16:44:14Z) - Decision Making in Non-Stationary Environments with Policy-Augmented
Monte Carlo Tree Search [2.20439695290991]
Decision-making under uncertainty (DMU) is present in many important problems.
An open challenge is DMU in non-stationary environments, where the dynamics of the environment can change over time.
We present a novel hybrid decision-making approach that combines the strengths of RL and planning while mitigating their weaknesses.
arXiv Detail & Related papers (2022-02-25T22:31:37Z) - Automated Curriculum Learning for Embodied Agents: A Neuroevolutionary
Approach [0.0]
We demonstrate how an evolutionary algorithm can be extended with a curriculum learning process that selects automatically the environmental conditions in which the evolving agents are evaluated.
The results collected on two benchmark problems, that require to solve a task in significantly varying environmental conditions, demonstrate that the method proposed outperforms conventional algorithms and generates solutions that are robust to variations.
arXiv Detail & Related papers (2021-02-17T16:19:17Z) - Dynamic Regret of Policy Optimization in Non-stationary Environments [120.01408308460095]
We propose two model-free policy optimization algorithms, POWER and POWER++, and establish guarantees for their dynamic regret.
We show that POWER++ improves over POWER on the second component of the dynamic regret by actively adapting to non-stationarity through prediction.
To the best of our knowledge, our work is the first dynamic regret analysis of model-free RL algorithms in non-stationary environments.
arXiv Detail & Related papers (2020-06-30T23:34:37Z) - Deep Reinforcement Learning amidst Lifelong Non-Stationarity [67.24635298387624]
We show that an off-policy RL algorithm can reason about and tackle lifelong non-stationarity.
Our method leverages latent variable models to learn a representation of the environment from current and past experiences.
We also introduce several simulation environments that exhibit lifelong non-stationarity, and empirically find that our approach substantially outperforms approaches that do not reason about environment shift.
arXiv Detail & Related papers (2020-06-18T17:34:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.