On the Uniqueness of Solution for the Bellman Equation of LTL Objectives
- URL: http://arxiv.org/abs/2404.05074v1
- Date: Sun, 7 Apr 2024 21:06:52 GMT
- Title: On the Uniqueness of Solution for the Bellman Equation of LTL Objectives
- Authors: Zetong Xuan, Alper Kamil Bozkurt, Miroslav Pajic, Yu Wang,
- Abstract summary: We show that the uniqueness of the solution to the Bellman equation with two discount factors has not been explicitly discussed.
We then propose a condition for the Bellman equation to have the expected return as the unique solution.
- Score: 12.918524838804016
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Surrogate rewards for linear temporal logic (LTL) objectives are commonly utilized in planning problems for LTL objectives. In a widely-adopted surrogate reward approach, two discount factors are used to ensure that the expected return approximates the satisfaction probability of the LTL objective. The expected return then can be estimated by methods using the Bellman updates such as reinforcement learning. However, the uniqueness of the solution to the Bellman equation with two discount factors has not been explicitly discussed. We demonstrate with an example that when one of the discount factors is set to one, as allowed in many previous works, the Bellman equation may have multiple solutions, leading to inaccurate evaluation of the expected return. We then propose a condition for the Bellman equation to have the expected return as the unique solution, requiring the solutions for states inside a rejecting bottom strongly connected component (BSCC) to be 0. We prove this condition is sufficient by showing that the solutions for the states with discounting can be separated from those for the states without discounting under this condition
Related papers
- To bootstrap or to rollout? An optimal and adaptive interpolation [4.755935781862859]
We introduce a class of Bellman operators that interpolate between bootstrapping and rollout methods.
Our estimator combines the strengths of the bootstrapping-based temporal difference (TD) estimator and the rollout-based Monte Carlo (MC) methods.
arXiv Detail & Related papers (2024-11-14T19:00:00Z) - Relaxed Quantile Regression: Prediction Intervals for Asymmetric Noise [51.87307904567702]
Quantile regression is a leading approach for obtaining such intervals via the empirical estimation of quantiles in the distribution of outputs.
We propose Relaxed Quantile Regression (RQR), a direct alternative to quantile regression based interval construction that removes this arbitrary constraint.
We demonstrate that this added flexibility results in intervals with an improvement in desirable qualities.
arXiv Detail & Related papers (2024-06-05T13:36:38Z) - Parameterized Projected Bellman Operator [64.129598593852]
Approximate value iteration (AVI) is a family of algorithms for reinforcement learning (RL)
We propose a novel alternative approach based on learning an approximate version of the Bellman operator.
We formulate an optimization problem to learn PBO for generic sequential decision-making problems.
arXiv Detail & Related papers (2023-12-20T09:33:16Z) - LLQL: Logistic Likelihood Q-Learning for Reinforcement Learning [1.5734309088976395]
This study investigates the distribution of the Bellman approximation error through iterative exploration of the Bellman equation.
We propose the utilization of the Logistic maximum likelihood function (LLoss) as an alternative to the commonly used mean squared error (MSELoss) that assumes a Normal distribution for Bellman errors.
arXiv Detail & Related papers (2023-07-05T15:00:29Z) - Minimax Instrumental Variable Regression and $L_2$ Convergence
Guarantees without Identification or Closedness [71.42652863687117]
We study nonparametric estimation of instrumental variable (IV) regressions.
We propose a new penalized minimax estimator that can converge to a fixed IV solution.
We derive a strong $L$ error rate for our estimator under lax conditions.
arXiv Detail & Related papers (2023-02-10T18:08:49Z) - Reinforcement Learning with Non-Exponential Discounting [28.092095671829508]
We propose a theory for continuous-time model-based reinforcement learning generalized to arbitrary discount functions.
Our approach opens the way for the analysis of human discounting in sequential decision-making tasks.
arXiv Detail & Related papers (2022-09-27T14:13:16Z) - On solutions of the distributional Bellman equation [0.0]
We consider general distributional Bellman equations and study existence and uniqueness of their solutions as well as tail properties of return distributions.
We show that any solution of a distributional Bellman equation can be obtained as the vector of marginal laws of a solution to a multivariate affine distributional equation.
arXiv Detail & Related papers (2022-01-31T20:36:59Z) - Why Should I Trust You, Bellman? The Bellman Error is a Poor Replacement
for Value Error [83.10489974736404]
We study the use of the Bellman equation as a surrogate objective for value prediction accuracy.
We find that the Bellman error is a poor proxy for the accuracy of the value function.
arXiv Detail & Related papers (2022-01-28T21:03:59Z) - Reinforcement Learning in Linear MDPs: Constant Regret and
Representation Selection [136.4014229319618]
We study the role of the representation of state-action value functions in regret minimization in finite-horizon Markov Decision Processes (MDPs) with linear structure.
We first derive a necessary condition on the representation, called universally spanning optimal features (UNISOFT), to achieve constant regret in any MDP with linear reward function.
arXiv Detail & Related papers (2021-10-27T22:07:08Z) - Bayesian Bellman Operators [55.959376449737405]
We introduce a novel perspective on Bayesian reinforcement learning (RL)
Our framework is motivated by the insight that when bootstrapping is introduced, model-free approaches actually infer a posterior over Bellman operators, not value functions.
arXiv Detail & Related papers (2021-06-09T12:20:46Z) - Necessary and Sufficient Conditions for Inverse Reinforcement Learning
of Bayesian Stopping Time Problems [22.498689292081156]
This paper presents an inverse reinforcement learning(IRL) framework for Bayesian stopping time problems.
By observing the actions of a Bayesian decision maker, we provide a necessary and sufficient condition to identify if these actions are consistent with optimizing a cost function.
Our IRL algorithm identifies optimality and then constructs set valued estimates of the cost function.
arXiv Detail & Related papers (2020-07-07T14:14:12Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.