Q-Learning for Stochastic Control under General Information Structures
and Non-Markovian Environments
- URL: http://arxiv.org/abs/2311.00123v2
- Date: Mon, 4 Mar 2024 15:59:43 GMT
- Title: Q-Learning for Stochastic Control under General Information Structures
and Non-Markovian Environments
- Authors: Ali Devran Kara and Serdar Yuksel
- Abstract summary: We present a convergence theorem for iterations, and iterate in particular, Q-learnings under a general, possibly non-Markovian, environment.
We discuss the implications and applications of this theorem to a variety of control problems with non-Markovian environments.
- Score: 1.90365714903665
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: As a primary contribution, we present a convergence theorem for stochastic
iterations, and in particular, Q-learning iterates, under a general, possibly
non-Markovian, stochastic environment. Our conditions for convergence involve
an ergodicity and a positivity criterion. We provide a precise characterization
on the limit of the iterates and conditions on the environment and
initializations for convergence. As our second contribution, we discuss the
implications and applications of this theorem to a variety of stochastic
control problems with non-Markovian environments involving (i) quantized
approximations of fully observed Markov Decision Processes (MDPs) with
continuous spaces (where quantization break down the Markovian structure), (ii)
quantized approximations of belief-MDP reduced partially observable MDPS
(POMDPs) with weak Feller continuity and a mild version of filter stability
(which requires the knowledge of the model by the controller), (iii) finite
window approximations of POMDPs under a uniform controlled filter stability
(which does not require the knowledge of the model), and (iv) for multi-agent
models where convergence of learning dynamics to a new class of equilibria,
subjective Q-learning equilibria, will be studied. In addition to the
convergence theorem, some implications of the theorem above are new to the
literature and others are interpreted as applications of the convergence
theorem. Some open problems are noted.
Related papers
- Transition of $α$-mixing in Random Iterations with Applications in Queuing Theory [0.0]
We show the transfer of mixing properties from the exogenous regressor to the response via coupling arguments.
We also study Markov chains in random environments with drift and minorization conditions, even under non-stationary environments.
arXiv Detail & Related papers (2024-10-07T14:13:37Z) - Convergence of Score-Based Discrete Diffusion Models: A Discrete-Time Analysis [56.442307356162864]
We study the theoretical aspects of score-based discrete diffusion models under the Continuous Time Markov Chain (CTMC) framework.
We introduce a discrete-time sampling algorithm in the general state space $[S]d$ that utilizes score estimators at predefined time points.
Our convergence analysis employs a Girsanov-based method and establishes key properties of the discrete score function.
arXiv Detail & Related papers (2024-10-03T09:07:13Z) - A Unified Theory of Stochastic Proximal Point Methods without Smoothness [52.30944052987393]
Proximal point methods have attracted considerable interest owing to their numerical stability and robustness against imperfect tuning.
This paper presents a comprehensive analysis of a broad range of variations of the proximal point method (SPPM)
arXiv Detail & Related papers (2024-05-24T21:09:19Z) - Distributed Bayesian Learning of Dynamic States [65.7870637855531]
The proposed algorithm is a distributed Bayesian filtering task for finite-state hidden Markov models.
It can be used for sequential state estimation, as well as for modeling opinion formation over social networks under dynamic environments.
arXiv Detail & Related papers (2022-12-05T19:40:17Z) - Targeted Separation and Convergence with Kernel Discrepancies [61.973643031360254]
kernel-based discrepancy measures are required to (i) separate a target P from other probability measures or (ii) control weak convergence to P.
In this article we derive new sufficient and necessary conditions to ensure (i) and (ii)
For MMDs on separable metric spaces, we characterize those kernels that separate Bochner embeddable measures and introduce simple conditions for separating all measures with unbounded kernels.
arXiv Detail & Related papers (2022-09-26T16:41:16Z) - A Convergence Theory for Over-parameterized Variational Quantum
Eigensolvers [21.72347971869391]
The Variational Quantum Eigensolver (VQE) is a promising candidate for quantum applications on near-term Noisy Intermediate-Scale Quantum (NISQ) computers.
We provide the first rigorous analysis of the convergence of VQEs in the over- parameterization regime.
arXiv Detail & Related papers (2022-05-25T04:06:50Z) - Optimal variance-reduced stochastic approximation in Banach spaces [114.8734960258221]
We study the problem of estimating the fixed point of a contractive operator defined on a separable Banach space.
We establish non-asymptotic bounds for both the operator defect and the estimation error.
arXiv Detail & Related papers (2022-01-21T02:46:57Z) - Q-Learning for MDPs with General Spaces: Convergence and Near Optimality
via Quantization under Weak Continuity [2.685668802278156]
We show that Q-learning for standard Borel MDPs via quantization of states and actions converges to a limit.
Our paper presents a very general convergence and approximation result for the applicability of Q-learning for continuous MDPs.
arXiv Detail & Related papers (2021-11-12T15:47:10Z) - On the Convergence of Continuous Constrained Optimization for Structure
Learning [30.279796192573805]
We show the convergence of the augmented Lagrangian method (ALM) and the quadratic penalty method (QPM) for structure learning in the linear, nonlinear, and confounded cases.
We further establish the convergence guarantee of QPM to a DAG solution, under mild conditions.
arXiv Detail & Related papers (2020-11-23T00:29:37Z) - Near Optimality of Finite Memory Feedback Policies in Partially Observed
Markov Decision Processes [0.0]
We study a planning problem for POMDPs where the system dynamics and measurement channel model is assumed to be known.
We find optimal policies for the approximate belief model under mild non-linear filter stability conditions.
We also establish a rate of convergence result which relates the finite window memory size and the approximation error bound.
arXiv Detail & Related papers (2020-10-15T00:37:51Z) - Fast Objective & Duality Gap Convergence for Non-Convex Strongly-Concave
Min-Max Problems with PL Condition [52.08417569774822]
This paper focuses on methods for solving smooth non-concave min-max problems, which have received increasing attention due to deep learning (e.g., deep AUC)
arXiv Detail & Related papers (2020-06-12T00:32:21Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.