Beating the Best Constant Rebalancing Portfolio in Long-Term Investment: A Generalization of the Kelly Criterion and Universal Learning Algorithm for Markets with Serial Dependence
- URL: http://arxiv.org/abs/2507.05994v1
- Date: Tue, 08 Jul 2025 13:54:14 GMT
- Title: Beating the Best Constant Rebalancing Portfolio in Long-Term Investment: A Generalization of the Kelly Criterion and Universal Learning Algorithm for Markets with Serial Dependence
- Authors: Duy Khanh Lam,
- Abstract summary: Existing learning algorithms generate strategies that yield significantly poorer cumulative wealth compared to the best constant rebalancing portfolio in hindsight.<n>This paper proposes an algorithm that learns such dependence using only gradually revealed data, without any assumption on their distribution, to form a strategy that eventually exceeds the cumulative wealth of the best constant rebalancing portfolio.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In the online portfolio optimization framework, existing learning algorithms generate strategies that yield significantly poorer cumulative wealth compared to the best constant rebalancing portfolio in hindsight, despite being consistent in asymptotic growth rate. While this unappealing performance can be improved by incorporating more side information, it raises difficulties in feature selection and high-dimensional settings. Instead, the inherent serial dependence of assets' returns, such as day-of-the-week and other calendar effects, can be leveraged. Although latent serial dependence patterns are commonly detected using large training datasets, this paper proposes an algorithm that learns such dependence using only gradually revealed data, without any assumption on their distribution, to form a strategy that eventually exceeds the cumulative wealth of the best constant rebalancing portfolio. Moreover, the classical Kelly criterion, which requires independent assets' returns, is generalized to accommodate serial dependence in a market modeled as an independent and identically distributed process of random matrices. In such a stochastic market, where existing learning algorithms designed for stationary processes fail to apply, the proposed learning algorithm still generates a strategy that asymptotically grows to the highest rate among all strategies, matching that of the optimal strategy constructed under the generalized Kelly criterion. The experimental results with real market data demonstrate the theoretical guarantees of the algorithm and its performance as expected, as long as serial dependence is significant, regardless of the validity of the generalized Kelly criterion in the experimental market. This further affirms the broad applicability of the algorithm in general contexts.
Related papers
- Comparing Normalization Methods for Portfolio Optimization with Reinforcement Learning [2.186901738997926]
Recently, reinforcement learning has achieved remarkable results in various domains, including robotics, games, natural language processing, and finance.<n>This paper explores two of the most widely used normalization methods across three different markets and compares them with the standard practice of normalizing data before training.<n>The results indicate that, in this specific domain, the state normalization can indeed degrade the agent's performance.
arXiv Detail & Related papers (2025-08-05T20:51:13Z) - Sequential Portfolio Selection under Latent Side Information-Dependence Structure: Optimality and Universal Learning Algorithms [0.0]
We show that a dynamic strategy, which forms a portfolio based on perfect knowledge of the dependence structure and full market information over time, may not grow at a higher rate infinitely often than a constant strategy.<n>We show that a random optimal constant strategy almost surely exists, even when a limiting growth rate for the dynamic strategy does not.
arXiv Detail & Related papers (2025-01-12T03:49:47Z) - Mean--Variance Portfolio Selection by Continuous-Time Reinforcement Learning: Algorithms, Regret Analysis, and Empirical Study [10.404992912881601]
We study continuous-time mean--variance portfolio selection in markets where stock prices are diffusion processes driven by observable factors.<n>We present a general data-driven RL algorithm that learns the pre-committed investment strategy directly without attempting to learn or estimate the market coefficients.<n>The results demonstrate that the continuous-time RL strategies are consistently among the best especially in a volatile bear market.
arXiv Detail & Related papers (2024-12-08T15:31:10Z) - Mean-Variance Portfolio Selection in Long-Term Investments with Unknown Distribution: Online Estimation, Risk Aversion under Ambiguity, and Universality of Algorithms [0.0]
This paper adopts a perspective where data gradually and continuously reveal over time.<n>The original model is recast into an online learning framework, which is free from any statistical assumptions.<n>When the distribution of future data follows a normal shape, the growth rate of wealth is shown to increase by lifting the portfolio along the efficient frontier through the calibration of risk aversion.
arXiv Detail & Related papers (2024-06-19T12:11:42Z) - Stochastic Q-learning for Large Discrete Action Spaces [79.1700188160944]
In complex environments with discrete action spaces, effective decision-making is critical in reinforcement learning (RL)
We present value-based RL approaches which, as opposed to optimizing over the entire set of $n$ actions, only consider a variable set of actions, possibly as small as $mathcalO(log(n)$)$.
The presented value-based RL methods include, among others, Q-learning, StochDQN, StochDDQN, all of which integrate this approach for both value-function updates and action selection.
arXiv Detail & Related papers (2024-05-16T17:58:44Z) - Distributionally Robust Model-based Reinforcement Learning with Large
State Spaces [55.14361269378122]
Three major challenges in reinforcement learning are the complex dynamical systems with large state spaces, the costly data acquisition processes, and the deviation of real-world dynamics from the training environment deployment.
We study distributionally robust Markov decision processes with continuous state spaces under the widely used Kullback-Leibler, chi-square, and total variation uncertainty sets.
We propose a model-based approach that utilizes Gaussian Processes and the maximum variance reduction algorithm to efficiently learn multi-output nominal transition dynamics.
arXiv Detail & Related papers (2023-09-05T13:42:11Z) - Pessimistic Q-Learning for Offline Reinforcement Learning: Towards
Optimal Sample Complexity [51.476337785345436]
We study a pessimistic variant of Q-learning in the context of finite-horizon Markov decision processes.
A variance-reduced pessimistic Q-learning algorithm is proposed to achieve near-optimal sample complexity.
arXiv Detail & Related papers (2022-02-28T15:39:36Z) - Policy Mirror Descent for Regularized Reinforcement Learning: A
Generalized Framework with Linear Convergence [60.20076757208645]
This paper proposes a general policy mirror descent (GPMD) algorithm for solving regularized RL.
We demonstrate that our algorithm converges linearly over an entire range learning rates, in a dimension-free fashion, to the global solution.
arXiv Detail & Related papers (2021-05-24T02:21:34Z) - COMBO: Conservative Offline Model-Based Policy Optimization [120.55713363569845]
Uncertainty estimation with complex models, such as deep neural networks, can be difficult and unreliable.
We develop a new model-based offline RL algorithm, COMBO, that regularizes the value function on out-of-support state-actions.
We find that COMBO consistently performs as well or better as compared to prior offline model-free and model-based methods.
arXiv Detail & Related papers (2021-02-16T18:50:32Z) - Dynamic Federated Learning [57.14673504239551]
Federated learning has emerged as an umbrella term for centralized coordination strategies in multi-agent environments.
We consider a federated learning model where at every iteration, a random subset of available agents perform local updates based on their data.
Under a non-stationary random walk model on the true minimizer for the aggregate optimization problem, we establish that the performance of the architecture is determined by three factors, namely, the data variability at each agent, the model variability across all agents, and a tracking term that is inversely proportional to the learning rate of the algorithm.
arXiv Detail & Related papers (2020-02-20T15:00:54Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.