Belief-State Query Policies for Planning With Preferences Under Partial Observability
- URL: http://arxiv.org/abs/2405.15907v1
- Date: Fri, 24 May 2024 20:04:51 GMT
- Title: Belief-State Query Policies for Planning With Preferences Under Partial Observability
- Authors: Daniel Bramblett, Siddharth Srivastava,
- Abstract summary: Planning in real-world settings often entails addressing partial observability while aligning with users' preferences.
We present a novel framework for expressing users' preferences about agent behavior in a partially observable setting using parameterized belief-state query (BSQ) preferences.
We show that BSQ preferences provide a computationally feasible approach for planning with preferences in partially observable settings.
- Score: 18.821166966365315
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Planning in real-world settings often entails addressing partial observability while aligning with users' preferences. We present a novel framework for expressing users' preferences about agent behavior in a partially observable setting using parameterized belief-state query (BSQ) preferences in the setting of goal-oriented partially observable Markov decision processes (gPOMDPs). We present the first formal analysis of such preferences and prove that while the expected value of a BSQ preference is not a convex function w.r.t its parameters, it is piecewise constant and yields an implicit discrete parameter search space that is finite for finite horizons. This theoretical result leads to novel algorithms that optimize gPOMDP agent behavior while guaranteeing user preference compliance. Theoretical analysis proves that our algorithms converge to the optimal preference-compliant behavior in the limit. Empirical results show that BSQ preferences provide a computationally feasible approach for planning with preferences in partially observable settings.
Related papers
- Robust Bayesian Optimization via Localized Online Conformal Prediction [37.549297668783254]
We introduce localized online conformal prediction-based Bayesian optimization (LOCBO)
LOCBO calibrates the GP model through localized online conformal prediction (CP)
We provide theoretical performance guarantees for LOCBO's iterates that hold for the unobserved objective function.
arXiv Detail & Related papers (2024-11-26T12:45:54Z) - Efficient Learning of POMDPs with Known Observation Model in Average-Reward Setting [56.92178753201331]
We propose the Observation-Aware Spectral (OAS) estimation technique, which enables the POMDP parameters to be learned from samples collected using a belief-based policy.
We show the consistency of the OAS procedure, and we prove a regret guarantee of order $mathcalO(sqrtT log(T)$ for the proposed OAS-UCRL algorithm.
arXiv Detail & Related papers (2024-10-02T08:46:34Z) - Likelihood Ratio Confidence Sets for Sequential Decision Making [51.66638486226482]
We revisit the likelihood-based inference principle and propose to use likelihood ratios to construct valid confidence sequences.
Our method is especially suitable for problems with well-specified likelihoods.
We show how to provably choose the best sequence of estimators and shed light on connections to online convex optimization.
arXiv Detail & Related papers (2023-11-08T00:10:21Z) - Variational Inference with Coverage Guarantees in Simulation-Based Inference [18.818573945984873]
We propose Conformalized Amortized Neural Variational Inference (CANVI)
CANVI constructs conformalized predictors based on each candidate, compares the predictors using a metric known as predictive efficiency, and returns the most efficient predictor.
We prove lower bounds on the predictive efficiency of the regions produced by CANVI and explore how the quality of a posterior approximation relates to the predictive efficiency of prediction regions based on that approximation.
arXiv Detail & Related papers (2023-05-23T17:24:04Z) - The Choice of Noninformative Priors for Thompson Sampling in
Multiparameter Bandit Models [56.31310344616837]
Thompson sampling (TS) has been known for its outstanding empirical performance supported by theoretical guarantees across various reward models.
This study explores the impact of selecting noninformative priors, offering insights into the performance of TS when dealing with new models that lack theoretical understanding.
arXiv Detail & Related papers (2023-02-28T08:42:42Z) - Bayesian Optimization over Discrete and Mixed Spaces via Probabilistic
Reparameterization [29.178417789839102]
optimizing black-box functions of discrete (and potentially continuous) design parameters is a ubiquitous problem in scientific and engineering applications.
We propose using probabilistic re parameterization (PR) to maximize the expectation of the acquisition function (AF) over a probability distribution.
PR is complementary to (and benefits) recent work and naturally generalizes to settings with multiple objectives and black-box constraints.
arXiv Detail & Related papers (2022-10-18T22:41:00Z) - Generalizing Bayesian Optimization with Decision-theoretic Entropies [102.82152945324381]
We consider a generalization of Shannon entropy from work in statistical decision theory.
We first show that special cases of this entropy lead to popular acquisition functions used in BO procedures.
We then show how alternative choices for the loss yield a flexible family of acquisition functions.
arXiv Detail & Related papers (2022-10-04T04:43:58Z) - Probabilistic Planning with Partially Ordered Preferences over Temporal
Goals [22.77805882908817]
We study planning in Markov decision processes (MDPs) with preferences over temporally extended goals.
We introduce a variant of deterministic finite automaton, referred to as a preference DFA, for specifying the user's preferences over temporally extended goals.
We prove that a weak-stochastic nondominated policy given the preference specification is optimal in the constructed multi-objective MDP.
arXiv Detail & Related papers (2022-09-25T17:13:24Z) - Meta-Wrapper: Differentiable Wrapping Operator for User Interest
Selection in CTR Prediction [97.99938802797377]
Click-through rate (CTR) prediction, whose goal is to predict the probability of the user to click on an item, has become increasingly significant in recommender systems.
Recent deep learning models with the ability to automatically extract the user interest from his/her behaviors have achieved great success.
We propose a novel approach under the framework of the wrapper method, which is named Meta-Wrapper.
arXiv Detail & Related papers (2022-06-28T03:28:15Z) - Theory-inspired Parameter Control Benchmarks for Dynamic Algorithm
Configuration [32.055812915031666]
We show how to compute optimal parameter portfolios of a given size.
We extend this benchmark by analyzing optimal control policies that can select the parameters only from a given portfolio of possible values.
We demonstrate the usefulness of our benchmarks by analyzing the behavior of the DDQN reinforcement learning approach for dynamic algorithm configuration.
arXiv Detail & Related papers (2022-02-07T15:00:30Z) - Robust and Adaptive Temporal-Difference Learning Using An Ensemble of
Gaussian Processes [70.80716221080118]
The paper takes a generative perspective on policy evaluation via temporal-difference (TD) learning.
The OS-GPTD approach is developed to estimate the value function for a given policy by observing a sequence of state-reward pairs.
To alleviate the limited expressiveness associated with a single fixed kernel, a weighted ensemble (E) of GP priors is employed to yield an alternative scheme.
arXiv Detail & Related papers (2021-12-01T23:15:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.