Related papers: Improving Portfolio Optimization Results with Bandit Networks

Improving Portfolio Optimization Results with Bandit Networks

URL: http://arxiv.org/abs/2410.04217v2
Date: Tue, 8 Oct 2024 07:48:03 GMT
Title: Improving Portfolio Optimization Results with Bandit Networks
Authors: Gustavo de Freitas Fonseca, Lucas Coelho e Silva, Paulo André Lima de Castro,
Abstract summary: We introduce and evaluate novel Bandit algorithms designed for non-stationary environments. First, we present the Adaptive Discounted Thompson Sampling (ADTS) algorithm. We then extend this approach to the Portfolio Optimization problem by introducing the Combinatorial Adaptive Discounted Thompson Sampling (CADTS) algorithm.
Score: 0.0
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: In Reinforcement Learning (RL), multi-armed Bandit (MAB) problems have found applications across diverse domains such as recommender systems, healthcare, and finance. Traditional MAB algorithms typically assume stationary reward distributions, which limits their effectiveness in real-world scenarios characterized by non-stationary dynamics. This paper addresses this limitation by introducing and evaluating novel Bandit algorithms designed for non-stationary environments. First, we present the Adaptive Discounted Thompson Sampling (ADTS) algorithm, which enhances adaptability through relaxed discounting and sliding window mechanisms to better respond to changes in reward distributions. We then extend this approach to the Portfolio Optimization problem by introducing the Combinatorial Adaptive Discounted Thompson Sampling (CADTS) algorithm, which addresses computational challenges within Combinatorial Bandits and improves dynamic asset allocation. Additionally, we propose a novel architecture called Bandit Networks, which integrates the outputs of ADTS and CADTS, thereby mitigating computational limitations in stock selection. Through extensive experiments using real financial market data, we demonstrate the potential of these algorithms and architectures in adapting to dynamic environments and optimizing decision-making processes. For instance, the proposed bandit network instances present superior performance when compared to classic portfolio optimization approaches, such as capital asset pricing model, equal weights, risk parity, and Markovitz, with the best network presenting an out-of-sample Sharpe Ratio 20% higher than the best performing classical model.

Related papers

Preference Optimization for Combinatorial Optimization Problems [54.87466279363487]
Reinforcement Learning (RL) has emerged as a powerful tool for neural optimization, enabling models learns that solve complex problems without requiring expert knowledge.<n>Despite significant progress, existing RL approaches face challenges such as diminishing reward signals and inefficient exploration in vast action spaces.<n>We propose Preference Optimization, a novel method that transforms quantitative reward signals into qualitative preference signals via statistical comparison modeling.
arXiv Detail & Related papers (2025-05-13T16:47:00Z)
Review, Refine, Repeat: Understanding Iterative Decoding of AI Agents with Dynamic Evaluation and Selection [71.92083784393418]
Inference-time methods such as Best-of-N (BON) sampling offer a simple yet effective alternative to improve performance. We propose Iterative Agent Decoding (IAD) which combines iterative refinement with dynamic candidate evaluation and selection guided by a verifier.
arXiv Detail & Related papers (2025-04-02T17:40:47Z)
A RankNet-Inspired Surrogate-Assisted Hybrid Metaheuristic for Expensive Coverage Optimization [5.757318591302855]
We propose a RankNet-Inspired Surrogate-assisted Hybrid Metaheuristic to handle large-scale coverage optimization tasks. Our algorithm consistently outperforms state-of-the-art algorithms for EMVOPs.
arXiv Detail & Related papers (2025-01-13T14:49:05Z)
Provably Mitigating Overoptimization in RLHF: Your SFT Loss is Implicitly an Adversarial Regularizer [52.09480867526656]
We identify the source of misalignment as a form of distributional shift and uncertainty in learning human preferences. To mitigate overoptimization, we first propose a theoretical algorithm that chooses the best policy for an adversarially chosen reward model. Using the equivalence between reward models and the corresponding optimal policy, the algorithm features a simple objective that combines a preference optimization loss and a supervised learning loss.
arXiv Detail & Related papers (2024-05-26T05:38:50Z)
Pointer Networks with Q-Learning for Combinatorial Optimization [55.2480439325792]
We introduce the Pointer Q-Network (PQN), a hybrid neural architecture that integrates model-free Q-value policy approximation with Pointer Networks (Ptr-Nets) Our empirical results demonstrate the efficacy of this approach, also testing the model in unstable environments.
arXiv Detail & Related papers (2023-11-05T12:03:58Z)
Federated Conditional Stochastic Optimization [110.513884892319]
Conditional optimization has found in a wide range of machine learning tasks, such as in-variant learning tasks, AUPRC, andAML. This paper proposes algorithms for distributed federated learning.
arXiv Detail & Related papers (2023-10-04T01:47:37Z)
Optimization Guarantees of Unfolded ISTA and ADMM Networks With Smooth Soft-Thresholding [57.71603937699949]
We study optimization guarantees, i.e., achieving near-zero training loss with the increase in the number of learning epochs. We show that the threshold on the number of training samples increases with the increase in the network width.
arXiv Detail & Related papers (2023-09-12T13:03:47Z)
Model-based Causal Bayesian Optimization [74.78486244786083]
We introduce the first algorithm for Causal Bayesian Optimization with Multiplicative Weights (CBO-MW) We derive regret bounds for CBO-MW that naturally depend on graph-related quantities. Our experiments include a realistic demonstration of how CBO-MW can be used to learn users' demand patterns in a shared mobility system.
arXiv Detail & Related papers (2023-07-31T13:02:36Z)
Diversifying Investments and Maximizing Sharpe Ratio: a novel QUBO formulation [0.0]
We propose a new QUBO formulation for the task described and provide the mathematical details and required assumptions. We derive results via the available QUBO solvers, as well as discussing the behaviour of Hybrid approaches to tackle large scale problems in the term.
arXiv Detail & Related papers (2023-02-23T19:15:44Z)
A Bandit Approach to Online Pricing for Heterogeneous Edge Resource Allocation [8.089950414444115]
Two novel online pricing mechanisms are proposed for heterogeneous edge resource allocation. The mechanisms operate in real-time and do not require prior knowledge of demand distribution. The proposed posted pricing schemes allow users to select and pay for their preferred resources, with the platform dynamically adjusting resource prices based on observed historical data.
arXiv Detail & Related papers (2023-02-14T10:21:14Z)
Slimmable Domain Adaptation [112.19652651687402]
We introduce a simple framework, Slimmable Domain Adaptation, to improve cross-domain generalization with a weight-sharing model bank. Our framework surpasses other competing approaches by a very large margin on multiple benchmarks.
arXiv Detail & Related papers (2022-06-14T06:28:04Z)
Comparing Classical-Quantum Portfolio Optimization with Enhanced Constraints [0.0]
We show how to add fundamental analysis to the portfolio optimization problem, adding in asset-specific and global constraints based on chosen balance sheet metrics. We analyze the current state-of-the-art algorithms for solving such a problem using D-Wave's Quantum Processor and compare the quality of the solutions obtained to commercially-available optimization software.
arXiv Detail & Related papers (2022-03-09T17:46:32Z)
Correlated Bandits for Dynamic Pricing via the ARC algorithm [2.7564955518050693]
Asymptotic Randomised Control provides a rigorous approximation to the optimal strategy for a wide class of Bayesian bandits. It allows a decision maker to observe signals in addition to their rewards, to incorporate correlations between the outcomes of different choices, and to have nontrivial dynamics for their estimates.
arXiv Detail & Related papers (2021-02-08T14:54:26Z)

This list is automatically generated from the titles and abstracts of the papers in this site.