Learning Merton's Strategies in an Incomplete Market: Recursive Entropy
Regularization and Biased Gaussian Exploration
- URL: http://arxiv.org/abs/2312.11797v1
- Date: Tue, 19 Dec 2023 02:14:13 GMT
- Title: Learning Merton's Strategies in an Incomplete Market: Recursive Entropy
Regularization and Biased Gaussian Exploration
- Authors: Min Dai, Yuchao Dong, Yanwei Jia, and Xun Yu Zhou
- Abstract summary: We take the reinforcement learning (RL) approach to learn optimal portfolio policies directly by exploring the unknown market.
We present an analysis of the resulting errors to show how the level of exploration affects the learned policies.
- Score: 11.774563966512709
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We study Merton's expected utility maximization problem in an incomplete
market, characterized by a factor process in addition to the stock price
process, where all the model primitives are unknown. We take the reinforcement
learning (RL) approach to learn optimal portfolio policies directly by
exploring the unknown market, without attempting to estimate the model
parameters. Based on the entropy-regularization framework for general
continuous-time RL formulated in Wang et al. (2020), we propose a recursive
weighting scheme on exploration that endogenously discounts the current
exploration reward by the past accumulative amount of exploration. Such a
recursive regularization restores the optimality of Gaussian exploration.
However, contrary to the existing results, the optimal Gaussian policy turns
out to be biased in general, due to the interwinding needs for hedging and for
exploration. We present an asymptotic analysis of the resulting errors to show
how the level of exploration affects the learned policies. Furthermore, we
establish a policy improvement theorem and design several RL algorithms to
learn Merton's optimal strategies. At last, we carry out both simulation and
empirical studies with a stochastic volatility environment to demonstrate the
efficiency and robustness of the RL algorithms in comparison to the
conventional plug-in method.
Related papers
- Efficient Learning of POMDPs with Known Observation Model in Average-Reward Setting [56.92178753201331]
We propose the Observation-Aware Spectral (OAS) estimation technique, which enables the POMDP parameters to be learned from samples collected using a belief-based policy.
We show the consistency of the OAS procedure, and we prove a regret guarantee of order $mathcalO(sqrtT log(T)$ for the proposed OAS-UCRL algorithm.
arXiv Detail & Related papers (2024-10-02T08:46:34Z) - Model-Free Active Exploration in Reinforcement Learning [53.786439742572995]
We study the problem of exploration in Reinforcement Learning and present a novel model-free solution.
Our strategy is able to identify efficient policies faster than state-of-the-art exploration approaches.
arXiv Detail & Related papers (2024-06-30T19:00:49Z) - Learning Optimal Deterministic Policies with Stochastic Policy Gradients [62.81324245896716]
Policy gradient (PG) methods are successful approaches to deal with continuous reinforcement learning (RL) problems.
In common practice, convergence (hyper)policies are learned only to deploy their deterministic version.
We show how to tune the exploration level used for learning to optimize the trade-off between the sample complexity and the performance of the deployed deterministic policy.
arXiv Detail & Related papers (2024-05-03T16:45:15Z) - Provable and Practical: Efficient Exploration in Reinforcement Learning via Langevin Monte Carlo [104.9535542833054]
We present a scalable and effective exploration strategy based on Thompson sampling for reinforcement learning (RL)
We instead directly sample the Q function from its posterior distribution, by using Langevin Monte Carlo.
Our approach achieves better or similar results compared with state-of-the-art deep RL algorithms on several challenging exploration tasks from the Atari57 suite.
arXiv Detail & Related papers (2023-05-29T17:11:28Z) - Truncating Trajectories in Monte Carlo Reinforcement Learning [48.97155920826079]
In Reinforcement Learning (RL), an agent acts in an unknown environment to maximize the expected cumulative discounted sum of an external reward signal.
We propose an a-priori budget allocation strategy that leads to the collection of trajectories of different lengths.
We show that an appropriate truncation of the trajectories can succeed in improving performance.
arXiv Detail & Related papers (2023-05-07T19:41:57Z) - One-Step Distributional Reinforcement Learning [10.64435582017292]
We present the simpler one-step distributional reinforcement learning (OS-DistrRL) framework.
We show that our approach comes with a unified theory for both policy evaluation and control.
We propose two OS-DistrRL algorithms for which we provide an almost sure convergence analysis.
arXiv Detail & Related papers (2023-04-27T06:57:00Z) - GEC: A Unified Framework for Interactive Decision Making in MDP, POMDP,
and Beyond [101.5329678997916]
We study sample efficient reinforcement learning (RL) under the general framework of interactive decision making.
We propose a novel complexity measure, generalized eluder coefficient (GEC), which characterizes the fundamental tradeoff between exploration and exploitation.
We show that RL problems with low GEC form a remarkably rich class, which subsumes low Bellman eluder dimension problems, bilinear class, low witness rank problems, PO-bilinear class, and generalized regular PSR.
arXiv Detail & Related papers (2022-11-03T16:42:40Z) - A Nonparametric Off-Policy Policy Gradient [32.35604597324448]
Reinforcement learning (RL) algorithms still suffer from high sample complexity despite outstanding recent successes.
We build on the general sample efficiency of off-policy algorithms.
We show that our approach has better sample efficiency than state-of-the-art policy gradient methods.
arXiv Detail & Related papers (2020-01-08T10:13:08Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.