Dynamic mean field programming
- URL: http://arxiv.org/abs/2206.05200v2
- Date: Wed, 12 Jul 2023 05:57:42 GMT
- Title: Dynamic mean field programming
- Authors: George Stamatescu
- Abstract summary: A dynamic mean field theory is developed for finite state and action reinforcement learning in the large state space limit.
Under certain assumptions, the state-action values are statistically independent across state-action pairs in the state space limit.
The results hold in the finite and discounted infinite horizon settings, for both value iteration and policy evaluation.
- Score: 1.2183405753834562
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: A dynamic mean field theory is developed for finite state and action Bayesian
reinforcement learning in the large state space limit. In an analogy with
statistical physics, the Bellman equation is studied as a disordered dynamical
system; the Markov decision process transition probabilities are interpreted as
couplings and the value functions as deterministic spins that evolve
dynamically. Thus, the mean-rewards and transition probabilities are considered
to be quenched random variables. The theory reveals that, under certain
assumptions, the state-action values are statistically independent across
state-action pairs in the asymptotic state space limit, and provides the form
of the distribution exactly. The results hold in the finite and discounted
infinite horizon settings, for both value iteration and policy evaluation. The
state-action value statistics can be computed from a set of mean field
equations, which we call dynamic mean field programming (DMFP). For policy
evaluation the equations are exact. For value iteration, approximate equations
are obtained by appealing to extreme value theory or bounds. The result
provides analytic insight into the statistical structure of tabular
reinforcement learning, for example revealing the conditions under which
reinforcement learning is equivalent to a set of independent multi-armed bandit
problems.
Related papers
- Learning Controlled Stochastic Differential Equations [61.82896036131116]
This work proposes a novel method for estimating both drift and diffusion coefficients of continuous, multidimensional, nonlinear controlled differential equations with non-uniform diffusion.
We provide strong theoretical guarantees, including finite-sample bounds for (L2), (Linfty), and risk metrics, with learning rates adaptive to coefficients' regularity.
Our method is available as an open-source Python library.
arXiv Detail & Related papers (2024-11-04T11:09:58Z) - Statistical Learning of Distributionally Robust Stochastic Control in Continuous State Spaces [17.96094201655567]
We explore the control of systems with potentially continuous state and action spaces, characterized by the state dynamics $X_t+1 = f(X_t, A_t, W_t)$.
Here, $X$, $A$, and $W$ represent the state, action, and random noise processes, respectively, with $f$ denoting a known function that describes state transitions.
This paper introduces a distributionally robust control paradigm that accommodates possibly adversarial perturbation to the noise distribution within a prescribed ambiguity set.
arXiv Detail & Related papers (2024-06-17T07:37:36Z) - Logistic-beta processes for dependent random probabilities with beta marginals [58.91121576998588]
We propose a novel process called the logistic-beta process, whose logistic transformation yields a process with common beta marginals.
It can model dependence on both discrete and continuous domains, such as space or time, and has a flexible dependence structure through correlation kernels.
We illustrate the benefits through nonparametric binary regression and conditional density estimation examples, both in simulation studies and in a pregnancy outcome application.
arXiv Detail & Related papers (2024-02-10T21:41:32Z) - Asymptotic behavior of continuous weak measurement and its application
to real-time parameter estimation [4.329298109272387]
The quantum trajectory of weak continuous measurement for the magnetometer is investigated.
We find that the behavior is insensitive to the initial state in the following sense: given one realization, the quantum trajectories starting from arbitrary initial statesally converge to the em same realization-specific em pure state.
arXiv Detail & Related papers (2023-11-03T17:50:45Z) - An information field theory approach to Bayesian state and parameter
estimation in dynamical systems [0.0]
This paper develops a scalable Bayesian approach to state and parameter estimation suitable for continuous-time, deterministic dynamical systems.
We construct a physics-informed prior probability measure on the function space of system responses so that functions that satisfy the physics are more likely.
arXiv Detail & Related papers (2023-06-03T16:36:43Z) - Correspondence between open bosonic systems and stochastic differential
equations [77.34726150561087]
We show that there can also be an exact correspondence at finite $n$ when the bosonic system is generalized to include interactions with the environment.
A particular system with the form of a discrete nonlinear Schr"odinger equation is analyzed in more detail.
arXiv Detail & Related papers (2023-02-03T19:17:37Z) - Discrete Lagrangian Neural Networks with Automatic Symmetry Discovery [3.06483729892265]
We introduce a framework to learn a discrete Lagrangian along with its symmetry group from discrete observations of motions.
The learning process does not restrict the form of the Lagrangian, does not require velocity or momentum observations or predictions and incorporates a cost term.
arXiv Detail & Related papers (2022-11-20T00:46:33Z) - Data-Driven Influence Functions for Optimization-Based Causal Inference [105.5385525290466]
We study a constructive algorithm that approximates Gateaux derivatives for statistical functionals by finite differencing.
We study the case where probability distributions are not known a priori but need to be estimated from data.
arXiv Detail & Related papers (2022-08-29T16:16:22Z) - Robust Value Iteration for Continuous Control Tasks [99.00362538261972]
When transferring a control policy from simulation to a physical system, the policy needs to be robust to variations in the dynamics to perform well.
We present Robust Fitted Value Iteration, which uses dynamic programming to compute the optimal value function on the compact state domain.
We show that robust value is more robust compared to deep reinforcement learning algorithm and the non-robust version of the algorithm.
arXiv Detail & Related papers (2021-05-25T19:48:35Z) - Value Iteration in Continuous Actions, States and Time [99.00362538261972]
We propose a continuous fitted value iteration (cFVI) algorithm for continuous states and actions.
The optimal policy can be derived for non-linear control-affine dynamics.
Videos of the physical system are available at urlhttps://sites.google.com/view/value-iteration.
arXiv Detail & Related papers (2021-05-10T21:40:56Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.