Related papers: Metareasoning in uncertain environments: a meta-BAMDP framework

Metareasoning in uncertain environments: a meta-BAMDP framework

URL: http://arxiv.org/abs/2408.01253v2
Date: Mon, 03 Feb 2025 15:11:31 GMT
Title: Metareasoning in uncertain environments: a meta-BAMDP framework
Authors: Prakhar Godara, Tilman Diego Aléman, Angela J. Yu,
Abstract summary: Finding the right $P$ can itself be framed as an optimization problem over the space of reasoning processes $P$.<n>This paper proposes a meta Bayes-Adaptive MDP framework to handle metareasoning in environments with unknown reward/transition distributions.
Score: 1.0923877073891441
License: http://creativecommons.org/licenses/by/4.0/
Abstract: \textit{Reasoning} may be viewed as an algorithm $P$ that makes a choice of an action $a^* \in \mathcal{A}$, aiming to optimize some outcome. However, executing $P$ itself bears costs (time, energy, limited capacity, etc.) and needs to be considered alongside explicit utility obtained by making the choice in the underlying decision problem. Finding the right $P$ can itself be framed as an optimization problem over the space of reasoning processes $P$, generally referred to as \textit{metareasoning}. Conventionally, human metareasoning models assume that the agent knows the transition and reward distributions of the underlying MDP. This paper generalizes such models by proposing a meta Bayes-Adaptive MDP (meta-BAMDP) framework to handle metareasoning in environments with unknown reward/transition distributions, which encompasses a far larger and more realistic set of planning problems that humans and AI systems face. As a first step, we apply the framework to Bernoulli bandit tasks. Owing to the meta problem's complexity, our solutions are necessarily approximate. However, we introduce two novel theorems that significantly enhance the tractability of the problem, enabling stronger approximations that are robust within a range of assumptions grounded in realistic human decision-making scenarios. These results offer a resource-rational perspective and a normative framework for understanding human exploration under cognitive constraints, as well as providing experimentally testable predictions about human behavior in Bernoulli Bandit tasks.

Related papers

Adaptive Thinking via Mode Policy Optimization for Social Language Agents [75.3092060637826]
We propose a framework to improve the adaptive thinking ability of language agents in dynamic social interactions.<n>Our framework advances existing research in three key aspects: (1) Multi-granular thinking mode design, (2) Context-aware mode switching across social interaction, and (3) Token-efficient reasoning via depth-adaptive processing.
arXiv Detail & Related papers (2025-05-04T15:39:58Z)
Pareto Optimal Algorithmic Recourse in Multi-cost Function [0.44938884406455726]
algorithmic recourse aims to identify minimal-cost actions to alter an individual features, thereby obtaining a desired outcome. Most current recourse mechanisms use gradient-based methods that assume cost functions are differentiable, often not applicable in real-world scenarios. This work proposes an algorithmic recourse framework that handles nondifferentiable and discrete multi-cost functions.
arXiv Detail & Related papers (2025-02-11T03:16:08Z)
Achieving PAC Guarantees in Mechanism Design through Multi-Armed Bandits [8.013444110633223]
We analytically derive a class of optimal solutions to a linear program (LP) for automated mechanism design.<n>These solutions can be expressed using a set of essential variables whose cardinality is exponentially smaller than the total number of variables in the original formulation.<n>We address this by translating the evaluation of this term into a multi-armed bandit (MAB) problem.
arXiv Detail & Related papers (2024-11-30T03:59:36Z)
Towards Cost Sensitive Decision Making [14.279123976398926]
In this work, we consider RL models that may actively acquire features from the environment to improve the decision quality and certainty. We propose the Active-Acquisition POMDP and identify two types of the acquisition process for different application domains. In order to assist the agent in the actively-acquired partially-observed environment and alleviate the exploration-exploitation dilemma, we develop a model-based approach.
arXiv Detail & Related papers (2024-10-04T19:48:23Z)
Tackling Decision Processes with Non-Cumulative Objectives using Reinforcement Learning [0.0]
We introduce a general mapping of non-cumulative Markov decision processes to standard MDPs. This allows all techniques developed to find optimal policies for MDPs to be directly applied to the larger class of NCMDPs. We show applications in a diverse set of tasks, including classical control, portfolio optimization in finance, and discrete optimization problems.
arXiv Detail & Related papers (2024-05-22T13:01:37Z)
Sample-efficient Learning of Infinite-horizon Average-reward MDPs with General Function Approximation [53.17668583030862]
We study infinite-horizon average-reward Markov decision processes (AMDPs) in the context of general function approximation. We propose a novel algorithmic framework named Local-fitted Optimization with OPtimism (LOOP) We show that LOOP achieves a sublinear $tildemathcalO(mathrmpoly(d, mathrmsp(V*)) sqrtTbeta )$ regret, where $d$ and $beta$ correspond to AGEC and log-covering number of the hypothesis class respectively
arXiv Detail & Related papers (2024-04-19T06:24:22Z)
Data-Driven Goal Recognition Design for General Behavioral Agents [14.750023724230774]
We introduce a data-driven approach to goal recognition design that can account for agents with general behavioral models. We propose a gradient-based optimization framework that accommodates various constraints to optimize decision-making environments.
arXiv Detail & Related papers (2024-04-03T20:38:22Z)
Towards Unified Alignment Between Agents, Humans, and Environment [24.731978646069]
We introduce the principles of $mathbfUA2$, which advocate for the simultaneous alignment of agents with human intentions, environmental dynamics, and self-constraints. We conduct proof-of-concept studies by introducing realistic features to WebShop, including user profiles to demonstrate intentions, personalized reranking for complex environmental dynamics, and runtime cost statistics to reflect self-constraints.
arXiv Detail & Related papers (2024-02-12T16:14:22Z)
Bayesian Learning of Optimal Policies in Markov Decision Processes with Countably Infinite State-Space [0.0]
We study the problem of optimal control of a family of discrete-time countable state-space Markov Decision Processes. We propose an algorithm based on Thompson sampling with dynamically-sized episodes. We show that our algorithm can be applied to develop approximately optimal control algorithms.
arXiv Detail & Related papers (2023-06-05T03:57:16Z)
Efficient Policy Iteration for Robust Markov Decision Processes via Regularization [49.05403412954533]
Robust decision processes (MDPs) provide a framework to model decision problems where the system dynamics are changing or only partially known. Recent work established the equivalence between texttts rectangular $L_p$ robust MDPs and regularized MDPs, and derived a regularized policy iteration scheme that enjoys the same level of efficiency as standard MDPs. In this work, we focus on the policy improvement step and derive concrete forms for the greedy policy and the optimal robust Bellman operators.
arXiv Detail & Related papers (2022-05-28T04:05:20Z)
Risk-Averse Decision Making Under Uncertainty [18.467950783426947]
A large class of decision making under uncertainty problems can be described via Markov decision processes (MDPs) or partially observable MDPs (POMDPs) In this paper, we consider the problem of designing policies for MDPs and POMDPs with objectives and constraints in terms of dynamic coherent risk measures.
arXiv Detail & Related papers (2021-09-09T07:52:35Z)
Iterative Feature Matching: Toward Provable Domain Generalization with Logarithmic Environments [55.24895403089543]
Domain generalization aims at performing well on unseen test environments with data from a limited number of training environments. We present a new algorithm based on performing iterative feature matching that is guaranteed with high probability to yield a predictor that generalizes after seeing only $O(logd_s)$ environments.
arXiv Detail & Related papers (2021-06-18T04:39:19Z)
Learning MDPs from Features: Predict-Then-Optimize for Sequential Decision Problems by Reinforcement Learning [52.74071439183113]
We study the predict-then-optimize framework in the context of sequential decision problems (formulated as MDPs) solved via reinforcement learning. Two significant computational challenges arise in applying decision-focused learning to MDPs.
arXiv Detail & Related papers (2021-06-06T23:53:31Z)
Navigating to the Best Policy in Markov Decision Processes [68.8204255655161]
We investigate the active pure exploration problem in Markov Decision Processes. Agent sequentially selects actions and, from the resulting system trajectory, aims at the best as fast as possible.
arXiv Detail & Related papers (2021-06-05T09:16:28Z)
Minimax Optimization with Smooth Algorithmic Adversaries [59.47122537182611]
We propose a new algorithm for the min-player against smooth algorithms deployed by an adversary. Our algorithm is guaranteed to make monotonic progress having no limit cycles, and to find an appropriate number of gradient ascents.
arXiv Detail & Related papers (2021-06-02T22:03:36Z)
On Exploiting Hitting Sets for Model Reconciliation [53.81101846598925]
In human-aware planning, a planning agent may need to provide an explanation to a human user on why its plan is optimal. A popular approach to do this is called model reconciliation, where the agent tries to reconcile the differences in its model and the human's model. We present a logic-based framework for model reconciliation that extends beyond the realm of planning.
arXiv Detail & Related papers (2020-12-16T21:25:53Z)
Stein Variational Model Predictive Control [130.60527864489168]
Decision making under uncertainty is critical to real-world, autonomous systems. Model Predictive Control (MPC) methods have demonstrated favorable performance in practice, but remain limited when dealing with complex distributions. We show that this framework leads to successful planning in challenging, non optimal control problems.
arXiv Detail & Related papers (2020-11-15T22:36:59Z)
Exploiting Submodular Value Functions For Scaling Up Active Perception [60.81276437097671]
In active perception tasks, agent aims to select sensory actions that reduce uncertainty about one or more hidden variables. Partially observable Markov decision processes (POMDPs) provide a natural model for such problems. As the number of sensors available to the agent grows, the computational cost of POMDP planning grows exponentially.
arXiv Detail & Related papers (2020-09-21T09:11:36Z)
Provably Efficient Model-Free Algorithm for MDPs with Peak Constraints [38.2783003051101]
This paper considers the peak Constrained Markov Decision Process (PCMDP), where the agent chooses the policy to maximize total reward in the finite horizon as well as satisfy constraints at each epoch with probability 1. We propose a model-free algorithm that converts PCMDP problem to an unconstrained problem and a Q-learning based approach is applied.
arXiv Detail & Related papers (2020-03-11T23:23:29Z)
Upper Confidence Primal-Dual Reinforcement Learning for CMDP with Adversarial Loss [145.54544979467872]
We consider online learning for episodically constrained Markov decision processes (CMDPs) We propose a new emphupper confidence primal-dual algorithm, which only requires the trajectories sampled from the transition model. Our analysis incorporates a new high-probability drift analysis of Lagrange multiplier processes into the celebrated regret analysis of upper confidence reinforcement learning.
arXiv Detail & Related papers (2020-03-02T05:02:23Z)

This list is automatically generated from the titles and abstracts of the papers in this site.