Related papers: Dynamic mean field programming

Dynamic mean field programming

URL: http://arxiv.org/abs/2206.05200v2
Date: Wed, 12 Jul 2023 05:57:42 GMT
Title: Dynamic mean field programming
Authors: George Stamatescu
Abstract summary: A dynamic mean field theory is developed for finite state and action reinforcement learning in the large state space limit. Under certain assumptions, the state-action values are statistically independent across state-action pairs in the state space limit. The results hold in the finite and discounted infinite horizon settings, for both value iteration and policy evaluation.
Score: 1.2183405753834562
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: A dynamic mean field theory is developed for finite state and action Bayesian reinforcement learning in the large state space limit. In an analogy with statistical physics, the Bellman equation is studied as a disordered dynamical system; the Markov decision process transition probabilities are interpreted as couplings and the value functions as deterministic spins that evolve dynamically. Thus, the mean-rewards and transition probabilities are considered to be quenched random variables. The theory reveals that, under certain assumptions, the state-action values are statistically independent across state-action pairs in the asymptotic state space limit, and provides the form of the distribution exactly. The results hold in the finite and discounted infinite horizon settings, for both value iteration and policy evaluation. The state-action value statistics can be computed from a set of mean field equations, which we call dynamic mean field programming (DMFP). For policy evaluation the equations are exact. For value iteration, approximate equations are obtained by appealing to extreme value theory or bounds. The result provides analytic insight into the statistical structure of tabular reinforcement learning, for example revealing the conditions under which reinforcement learning is equivalent to a set of independent multi-armed bandit problems.

Related papers

Conditional Distribution Quantization in Machine Learning [83.54039134248231]
Conditional expectation mathbbE(Y mid X) often fails to capture the complexity of multimodal conditional distributions mathcalL(Y mid X) We propose using n-point conditional quantizations--functional mappings of X that are learnable via gradient descent--to approximate mathcalL(Y mid X)
arXiv Detail & Related papers (2025-02-11T00:28:24Z)
Learning Controlled Stochastic Differential Equations [61.82896036131116]
This work proposes a novel method for estimating both drift and diffusion coefficients of continuous, multidimensional, nonlinear controlled differential equations with non-uniform diffusion. We provide strong theoretical guarantees, including finite-sample bounds for (L2), (Linfty), and risk metrics, with learning rates adaptive to coefficients' regularity. Our method is available as an open-source Python library.
arXiv Detail & Related papers (2024-11-04T11:09:58Z)
Statistical Learning of Distributionally Robust Stochastic Control in Continuous State Spaces [17.96094201655567]
We explore the control of systems with potentially continuous state and action spaces, characterized by the state dynamics $X_t+1 = f(X_t, A_t, W_t)$. Here, $X$, $A$, and $W$ represent the state, action, and random noise processes, respectively, with $f$ denoting a known function that describes state transitions. This paper introduces a distributionally robust control paradigm that accommodates possibly adversarial perturbation to the noise distribution within a prescribed ambiguity set.
arXiv Detail & Related papers (2024-06-17T07:37:36Z)
Logistic-beta processes for dependent random probabilities with beta marginals [58.91121576998588]
We propose a novel process called the logistic-beta process, whose logistic transformation yields a process with common beta marginals. It can model dependence on both discrete and continuous domains, such as space or time, and has a flexible dependence structure through correlation kernels. We illustrate the benefits through nonparametric binary regression and conditional density estimation examples, both in simulation studies and in a pregnancy outcome application.
arXiv Detail & Related papers (2024-02-10T21:41:32Z)
Asymptotic behavior of continuous weak measurement and its application to real-time parameter estimation [4.329298109272387]
The quantum trajectory of weak continuous measurement for the magnetometer is investigated. We find that the behavior is insensitive to the initial state in the following sense: given one realization, the quantum trajectories starting from arbitrary initial statesally converge to the em same realization-specific em pure state.
arXiv Detail & Related papers (2023-11-03T17:50:45Z)
An information field theory approach to Bayesian state and parameter estimation in dynamical systems [0.0]
This paper develops a scalable Bayesian approach to state and parameter estimation suitable for continuous-time, deterministic dynamical systems. We construct a physics-informed prior probability measure on the function space of system responses so that functions that satisfy the physics are more likely.
arXiv Detail & Related papers (2023-06-03T16:36:43Z)
Correspondence between open bosonic systems and stochastic differential equations [77.34726150561087]
We show that there can also be an exact correspondence at finite $n$ when the bosonic system is generalized to include interactions with the environment. A particular system with the form of a discrete nonlinear Schr"odinger equation is analyzed in more detail.
arXiv Detail & Related papers (2023-02-03T19:17:37Z)
Discrete Lagrangian Neural Networks with Automatic Symmetry Discovery [3.06483729892265]
We introduce a framework to learn a discrete Lagrangian along with its symmetry group from discrete observations of motions. The learning process does not restrict the form of the Lagrangian, does not require velocity or momentum observations or predictions and incorporates a cost term.
arXiv Detail & Related papers (2022-11-20T00:46:33Z)
Data-Driven Influence Functions for Optimization-Based Causal Inference [105.5385525290466]
We study a constructive algorithm that approximates Gateaux derivatives for statistical functionals by finite differencing. We study the case where probability distributions are not known a priori but need to be estimated from data.
arXiv Detail & Related papers (2022-08-29T16:16:22Z)
Robust Value Iteration for Continuous Control Tasks [99.00362538261972]
When transferring a control policy from simulation to a physical system, the policy needs to be robust to variations in the dynamics to perform well. We present Robust Fitted Value Iteration, which uses dynamic programming to compute the optimal value function on the compact state domain. We show that robust value is more robust compared to deep reinforcement learning algorithm and the non-robust version of the algorithm.
arXiv Detail & Related papers (2021-05-25T19:48:35Z)
Value Iteration in Continuous Actions, States and Time [99.00362538261972]
We propose a continuous fitted value iteration (cFVI) algorithm for continuous states and actions. The optimal policy can be derived for non-linear control-affine dynamics. Videos of the physical system are available at urlhttps://sites.google.com/view/value-iteration.
arXiv Detail & Related papers (2021-05-10T21:40:56Z)

This list is automatically generated from the titles and abstracts of the papers in this site.