Related papers: Act as You Learn: Adaptive Decision-Making in Non-Stationary Markov Decision Processes

Act as You Learn: Adaptive Decision-Making in Non-Stationary Markov Decision Processes

URL: http://arxiv.org/abs/2401.01841v3
Date: Mon, 22 Jan 2024 03:43:34 GMT
Title: Act as You Learn: Adaptive Decision-Making in Non-Stationary Markov Decision Processes
Authors: Baiting Luo, Yunuo Zhang, Abhishek Dubey, Ayan Mukhopadhyay
Abstract summary: We present a search algorithm called textitAdaptive Monte Carlo Tree Search (ADA-MCTS) We show that the agent can learn the updated dynamics of the environment over time and then act as it learns, i.e., if the agent is in a region of the state space about which it has updated knowledge, it can avoid being pessimistic.
Score: 5.276882857467777
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: A fundamental (and largely open) challenge in sequential decision-making is dealing with non-stationary environments, where exogenous environmental conditions change over time. Such problems are traditionally modeled as non-stationary Markov decision processes (NSMDP). However, existing approaches for decision-making in NSMDPs have two major shortcomings: first, they assume that the updated environmental dynamics at the current time are known (although future dynamics can change); and second, planning is largely pessimistic, i.e., the agent acts ``safely'' to account for the non-stationary evolution of the environment. We argue that both these assumptions are invalid in practice -- updated environmental conditions are rarely known, and as the agent interacts with the environment, it can learn about the updated dynamics and avoid being pessimistic, at least in states whose dynamics it is confident about. We present a heuristic search algorithm called \textit{Adaptive Monte Carlo Tree Search (ADA-MCTS)} that addresses these challenges. We show that the agent can learn the updated dynamics of the environment over time and then act as it learns, i.e., if the agent is in a region of the state space about which it has updated knowledge, it can avoid being pessimistic. To quantify ``updated knowledge,'' we disintegrate the aleatoric and epistemic uncertainty in the agent's updated belief and show how the agent can use these estimates for decision-making. We compare the proposed approach with the multiple state-of-the-art approaches in decision-making across multiple well-established open-source problems and empirically show that our approach is faster and highly adaptive without sacrificing safety.

Related papers

SAUP: Situation Awareness Uncertainty Propagation on LLM Agent [52.444674213316574]
Large language models (LLMs) integrated into multistep agent systems enable complex decision-making processes across various applications.<n>Existing uncertainty estimation methods primarily focus on final-step outputs, which fail to account for cumulative uncertainty over the multistep decision-making process and the dynamic interactions between agents and their environments.<n>We propose SAUP, a novel framework that propagates uncertainty through each step of an LLM-based agent's reasoning process.
arXiv Detail & Related papers (2024-12-02T01:31:13Z)
No Regrets: Investigating and Improving Regret Approximations for Curriculum Discovery [53.08822154199948]
Unsupervised Environment Design (UED) methods have gained recent attention as their adaptive curricula promise to enable agents to be robust to in- and out-of-distribution tasks. This work investigates how existing UED methods select training environments, focusing on task prioritisation metrics. We develop a method that directly trains on scenarios with high learnability.
arXiv Detail & Related papers (2024-08-27T14:31:54Z)
Certifiably Robust Policies for Uncertain Parametric Environments [57.2416302384766]
We propose a framework based on parametric Markov decision processes (MDPs) with unknown distributions over parameters. We learn and analyse IMDPs for a set of unknown sample environments induced by parameters. We show that our approach produces tight bounds on a policy's performance with high confidence.
arXiv Detail & Related papers (2024-08-06T10:48:15Z)
Uncertainty Quantification for Forward and Inverse Problems of PDEs via Latent Global Evolution [110.99891169486366]
We propose a method that integrates efficient and precise uncertainty quantification into a deep learning-based surrogate model. Our method endows deep learning-based surrogate models with robust and efficient uncertainty quantification capabilities for both forward and inverse problems. Our method excels at propagating uncertainty over extended auto-regressive rollouts, making it suitable for scenarios involving long-term predictions.
arXiv Detail & Related papers (2024-02-13T11:22:59Z)
HAZARD Challenge: Embodied Decision Making in Dynamically Changing Environments [93.94020724735199]
HAZARD consists of three unexpected disaster scenarios, including fire, flood, and wind. This benchmark enables us to evaluate autonomous agents' decision-making capabilities across various pipelines.
arXiv Detail & Related papers (2024-01-23T18:59:43Z)
Decision Making in Non-Stationary Environments with Policy-Augmented Search [9.000981144624507]
We introduce textitPolicy-Augmented Monte Carlo tree search (PA-MCTS) It combines action-value estimates from an out-of-date policy with an online search using an up-to-date model of the environment. We prove theoretical results showing conditions under which PA-MCTS selects the one-step optimal action and also bound the error accrued while following PA-MCTS as a policy.
arXiv Detail & Related papers (2024-01-06T11:51:50Z)
Risk-Sensitive and Robust Model-Based Reinforcement Learning and Planning [2.627046865670577]
We will address both planning and reinforcement learning approaches to sequential decision-making. In many real-world domains, it is impossible to construct a perfectly accurate model or simulator. We make a number of contributions towards this goal, with a focus on model-based algorithms.
arXiv Detail & Related papers (2023-04-02T16:44:14Z)
Decision Making in Non-Stationary Environments with Policy-Augmented Monte Carlo Tree Search [2.20439695290991]
Decision-making under uncertainty (DMU) is present in many important problems. An open challenge is DMU in non-stationary environments, where the dynamics of the environment can change over time. We present a novel hybrid decision-making approach that combines the strengths of RL and planning while mitigating their weaknesses.
arXiv Detail & Related papers (2022-02-25T22:31:37Z)
Automated Curriculum Learning for Embodied Agents: A Neuroevolutionary Approach [0.0]
We demonstrate how an evolutionary algorithm can be extended with a curriculum learning process that selects automatically the environmental conditions in which the evolving agents are evaluated. The results collected on two benchmark problems, that require to solve a task in significantly varying environmental conditions, demonstrate that the method proposed outperforms conventional algorithms and generates solutions that are robust to variations.
arXiv Detail & Related papers (2021-02-17T16:19:17Z)
Dynamic Regret of Policy Optimization in Non-stationary Environments [120.01408308460095]
We propose two model-free policy optimization algorithms, POWER and POWER++, and establish guarantees for their dynamic regret. We show that POWER++ improves over POWER on the second component of the dynamic regret by actively adapting to non-stationarity through prediction. To the best of our knowledge, our work is the first dynamic regret analysis of model-free RL algorithms in non-stationary environments.
arXiv Detail & Related papers (2020-06-30T23:34:37Z)
Deep Reinforcement Learning amidst Lifelong Non-Stationarity [67.24635298387624]
We show that an off-policy RL algorithm can reason about and tackle lifelong non-stationarity. Our method leverages latent variable models to learn a representation of the environment from current and past experiences. We also introduce several simulation environments that exhibit lifelong non-stationarity, and empirically find that our approach substantially outperforms approaches that do not reason about environment shift.
arXiv Detail & Related papers (2020-06-18T17:34:50Z)

This list is automatically generated from the titles and abstracts of the papers in this site.