On the continuity and smoothness of the value function in reinforcement learning and optimal control
- URL: http://arxiv.org/abs/2403.14432v1
- Date: Thu, 21 Mar 2024 14:39:28 GMT
- Title: On the continuity and smoothness of the value function in reinforcement learning and optimal control
- Authors: Hans Harder, Sebastian Peitz,
- Abstract summary: We show that the value function is always H"older continuous under relatively weak assumptions on the underlying system.
We also show that non-differentiable value functions can be made differentiable by slightly "disturbing" the system.
- Score: 1.534667887016089
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The value function plays a crucial role as a measure for the cumulative future reward an agent receives in both reinforcement learning and optimal control. It is therefore of interest to study how similar the values of neighboring states are, i.e., to investigate the continuity of the value function. We do so by providing and verifying upper bounds on the value function's modulus of continuity. Additionally, we show that the value function is always H\"older continuous under relatively weak assumptions on the underlying system and that non-differentiable value functions can be made differentiable by slightly "disturbing" the system.
Related papers
- A Test-Function Approach to Incremental Stability [33.44344966171865]
The regularity of value functions, and their connection to incremental stability, can be understood in a way that is distinct from the traditional Lyapunov-based approach to certifying stability in control theory.
arXiv Detail & Related papers (2025-07-01T11:46:52Z) - Statistical Inference for Temporal Difference Learning with Linear Function Approximation [62.69448336714418]
Temporal Difference (TD) learning, arguably the most widely used for policy evaluation, serves as a natural framework for this purpose.
In this paper, we study the consistency properties of TD learning with Polyak-Ruppert averaging and linear function approximation, and obtain three significant improvements over existing results.
arXiv Detail & Related papers (2024-10-21T15:34:44Z) - Prediction and Control in Continual Reinforcement Learning [39.30411018922005]
Temporal difference (TD) learning is often used to update the estimate of the value function which is used by RL agents to extract useful policies.
We propose to decompose the value function into two components which update at different timescales.
arXiv Detail & Related papers (2023-12-18T19:23:42Z) - Value-Distributional Model-Based Reinforcement Learning [59.758009422067]
Quantifying uncertainty about a policy's long-term performance is important to solve sequential decision-making tasks.
We study the problem from a model-based Bayesian reinforcement learning perspective.
We propose Epistemic Quantile-Regression (EQR), a model-based algorithm that learns a value distribution function.
arXiv Detail & Related papers (2023-08-12T14:59:19Z) - Leveraging Prior Knowledge in Reinforcement Learning via Double-Sided
Bounds on the Value Function [4.48890356952206]
We show how an arbitrary approximation for the value function can be used to derive double-sided bounds on the optimal value function of interest.
We extend the framework with error analysis for continuous state and action spaces.
arXiv Detail & Related papers (2023-02-19T21:47:24Z) - Confidence-Conditioned Value Functions for Offline Reinforcement
Learning [86.59173545987984]
We propose a new form of Bellman backup that simultaneously learns Q-values for any degree of confidence with high probability.
We theoretically show that our learned value functions produce conservative estimates of the true value at any desired confidence.
arXiv Detail & Related papers (2022-12-08T23:56:47Z) - Inference on Strongly Identified Functionals of Weakly Identified
Functions [71.42652863687117]
We study a novel condition for the functional to be strongly identified even when the nuisance function is not.
We propose penalized minimax estimators for both the primary and debiasing nuisance functions.
arXiv Detail & Related papers (2022-08-17T13:38:31Z) - Recurrent networks, hidden states and beliefs in partially observable
environments [3.4066110654930473]
Reinforcement learning aims to learn optimal policies from interaction with environments whose dynamics are unknown.
We show that in its hidden states, a recurrent neural network approximating the Q-function of a partially observable environment reproduces a sufficient statistic from the history that is correlated to the relevant part of the belief for taking optimal actions.
arXiv Detail & Related papers (2022-08-06T13:56:16Z) - Threading the Needle of On and Off-Manifold Value Functions for Shapley
Explanations [40.95261379462059]
We formalize the desiderata of value functions that respect both the model and the data manifold in a set of axioms.
We show that there exists a unique value function that satisfies these axioms.
arXiv Detail & Related papers (2022-02-24T06:22:34Z) - Robust and Adaptive Temporal-Difference Learning Using An Ensemble of
Gaussian Processes [70.80716221080118]
The paper takes a generative perspective on policy evaluation via temporal-difference (TD) learning.
The OS-GPTD approach is developed to estimate the value function for a given policy by observing a sequence of state-reward pairs.
To alleviate the limited expressiveness associated with a single fixed kernel, a weighted ensemble (E) of GP priors is employed to yield an alternative scheme.
arXiv Detail & Related papers (2021-12-01T23:15:09Z) - Variance-Aware Off-Policy Evaluation with Linear Function Approximation [85.75516599931632]
We study the off-policy evaluation problem in reinforcement learning with linear function approximation.
We propose an algorithm, VA-OPE, which uses the estimated variance of the value function to reweight the Bellman residual in Fitted Q-Iteration.
arXiv Detail & Related papers (2021-06-22T17:58:46Z) - Robust Value Iteration for Continuous Control Tasks [99.00362538261972]
When transferring a control policy from simulation to a physical system, the policy needs to be robust to variations in the dynamics to perform well.
We present Robust Fitted Value Iteration, which uses dynamic programming to compute the optimal value function on the compact state domain.
We show that robust value is more robust compared to deep reinforcement learning algorithm and the non-robust version of the algorithm.
arXiv Detail & Related papers (2021-05-25T19:48:35Z) - On the coercivity condition in the learning of interacting particle
systems [7.089219223012485]
Coercivity condition is equivalent to the strictly positive definiteness of an integral kernel arising in the learning.
We show that for a class of interaction functions such that the system is ergodic, the integral kernel is strictly positive definite, and hence the coercivity condition holds true.
arXiv Detail & Related papers (2020-11-20T16:20:06Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.