Related papers: Continuous-time q-Learning for Jump-Diffusion Models under Tsallis Entropy

Continuous-time q-Learning for Jump-Diffusion Models under Tsallis Entropy

URL: http://arxiv.org/abs/2407.03888v1
Date: Thu, 4 Jul 2024 12:26:31 GMT
Title: Continuous-time q-Learning for Jump-Diffusion Models under Tsallis Entropy
Authors: Lijun Bo, Yijie Huang, Xiang Yu, Tingting Zhang,
Abstract summary: We study continuous-time reinforcement learning for controlled jump-diffusion models by featuring the q-function and the q-learning algorithms under the Tsallis entropy regularization. In response, we establish the martingale characterization of the q-function under Tsallis entropy and devise two q-learning algorithms depending on whether the Lagrange multiplier can be derived explicitly or not.
Score: 8.924830900790713
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: This paper studies continuous-time reinforcement learning for controlled jump-diffusion models by featuring the q-function (the continuous-time counterpart of Q-function) and the q-learning algorithms under the Tsallis entropy regularization. Contrary to the conventional Shannon entropy, the general form of Tsallis entropy renders the optimal policy not necessary a Gibbs measure, where some Lagrange multiplier and KKT multiplier naturally arise from certain constraints to ensure the learnt policy to be a probability distribution. As a consequence,the relationship between the optimal policy and the q-function also involves the Lagrange multiplier. In response, we establish the martingale characterization of the q-function under Tsallis entropy and devise two q-learning algorithms depending on whether the Lagrange multiplier can be derived explicitly or not. In the latter case, we need to consider different parameterizations of the q-function and the policy and update them alternatively. Finally, we examine two financial applications, namely an optimal portfolio liquidation problem and a non-LQ control problem. It is interesting to see therein that the optimal policies under the Tsallis entropy regularization can be characterized explicitly, which are distributions concentrate on some compact support. The satisfactory performance of our q-learning algorithm is illustrated in both examples.

Related papers

Universal Approximation Theorem of Deep Q-Networks [2.1756081703276]
We analyze Deep Q-Networks (DQNs) via control and Forward-Backward Differential Equations (FBSDEs)<n>We show that DQNs can approximate the optimal Q-function on compact sets with arbitrary accuracy and high probability.<n>This work bridges deep reinforcement learning and control, offering insights into DQNs in continuous-time settings.
arXiv Detail & Related papers (2025-05-04T22:57:33Z)
Unified continuous-time q-learning for mean-field game and mean-field control problems [4.416317245952636]
We introduce the integrated q-function in decoupled form (decoupled Iq-function) and establish its martingale characterization together with the value function. We devise a unified q-learning algorithm for both mean-field game (MFG) and mean-field control (MFC) problems. For several examples in the jump-diffusion setting, within and beyond the LQ framework, we can obtain the exact parameterization of the decoupled Iq-functions and the value functions.
arXiv Detail & Related papers (2024-07-05T14:06:59Z)
Stochastic Q-learning for Large Discrete Action Spaces [79.1700188160944]
In complex environments with discrete action spaces, effective decision-making is critical in reinforcement learning (RL) We present value-based RL approaches which, as opposed to optimizing over the entire set of $n$ actions, only consider a variable set of actions, possibly as small as $mathcalO(log(n)$)$. The presented value-based RL methods include, among others, Q-learning, StochDQN, StochDDQN, all of which integrate this approach for both value-function updates and action selection.
arXiv Detail & Related papers (2024-05-16T17:58:44Z)
Value-Distributional Model-Based Reinforcement Learning [59.758009422067]
Quantifying uncertainty about a policy's long-term performance is important to solve sequential decision-making tasks. We study the problem from a model-based Bayesian reinforcement learning perspective. We propose Epistemic Quantile-Regression (EQR), a model-based algorithm that learns a value distribution function.
arXiv Detail & Related papers (2023-08-12T14:59:19Z)
Stability of Q-Learning Through Design and Optimism [0.0]
This paper is in part a tutorial on approximation and Q-learning. It provides details regarding the INFORMS APS inaugural Applied Probability Trust Plenary Lecture, presented in Nancy France, June 2023. The paper also presents new approaches to ensure stability and potentially accelerated convergence for these algorithms.
arXiv Detail & Related papers (2023-07-05T20:04:26Z)
Offline Minimax Soft-Q-learning Under Realizability and Partial Coverage [100.8180383245813]
We propose value-based algorithms for offline reinforcement learning (RL) We show an analogous result for vanilla Q-functions under a soft margin condition. Our algorithms' loss functions arise from casting the estimation problems as nonlinear convex optimization problems and Lagrangifying.
arXiv Detail & Related papers (2023-02-05T14:22:41Z)
q-Learning in Continuous Time [1.4213973379473654]
We study the continuous-time counterpart of Q-learning for reinforcement learning (RL) under the entropy-regularized, exploratory diffusion process formulation. We develop a q-learning" theory around the q-function that is independent of time discretization. We devise different actor-critic algorithms for solving underlying problems.
arXiv Detail & Related papers (2022-07-02T02:20:41Z)
Optimization Induced Equilibrium Networks [76.05825996887573]
Implicit equilibrium models, i.e., deep neural networks (DNNs) defined by implicit equations, have been becoming more and more attractive recently. We show that deep OptEq outperforms previous implicit models even with fewer parameters.
arXiv Detail & Related papers (2021-05-27T15:17:41Z)
Tightening the Dependence on Horizon in the Sample Complexity of Q-Learning [59.71676469100807]
This work sharpens the sample complexity of synchronous Q-learning to an order of $frac|mathcalS|| (1-gamma)4varepsilon2$ for any $0varepsilon 1$. Our finding unveils the effectiveness of vanilla Q-learning, which matches that of speedy Q-learning without requiring extra computation and storage.
arXiv Detail & Related papers (2021-02-12T14:22:05Z)
Solving optimal stopping problems with Deep Q-Learning [0.6445605125467574]
We propose a reinforcement learning (RL) approach to model optimal exercise strategies for option-type products. We approximate the Q-function with a deep neural network, which does not require the specification of basis functions. We derive a lower bound on the option price obtained from the trained neural network and an upper bound from the dual formulation of the stopping problem.
arXiv Detail & Related papers (2021-01-24T10:05:46Z)
Hamilton-Jacobi Deep Q-Learning for Deterministic Continuous-Time Systems with Lipschitz Continuous Controls [2.922007656878633]
We propose Q-learning algorithms for continuous-time deterministic optimal control problems with Lipschitz continuous controls. A novel semi-discrete version of the HJB equation is proposed to design a Q-learning algorithm that uses data collected in discrete time without discretizing or approximating the system dynamics.
arXiv Detail & Related papers (2020-10-27T06:11:04Z)
Global Convergence of Policy Gradient for Linear-Quadratic Mean-Field Control/Game in Continuous Time [109.06623773924737]
We study the policy gradient method for the linear-quadratic mean-field control and game. We show that it converges to the optimal solution at a linear rate, which is verified by a synthetic simulation.
arXiv Detail & Related papers (2020-08-16T06:34:11Z)
Exponentially Weighted l_2 Regularization Strategy in Constructing Reinforced Second-order Fuzzy Rule-based Model [72.57056258027336]
In the conventional Takagi-Sugeno-Kang (TSK)-type fuzzy models, constant or linear functions are usually utilized as the consequent parts of the fuzzy rules. We introduce an exponential weight approach inspired by the weight function theory encountered in harmonic analysis.
arXiv Detail & Related papers (2020-07-02T15:42:15Z)

This list is automatically generated from the titles and abstracts of the papers in this site.