Related papers: Stochastic Multi-armed Bandits with Non-stationary Rewards Generated by a Linear Dynamical System

Stochastic Multi-armed Bandits with Non-stationary Rewards Generated by a Linear Dynamical System

URL: http://arxiv.org/abs/2204.05782v1
Date: Wed, 6 Apr 2022 19:22:33 GMT
Title: Stochastic Multi-armed Bandits with Non-stationary Rewards Generated by a Linear Dynamical System
Authors: Jonathan Gornet, Mehdi Hosseinzadeh, Bruno Sinopoli
Abstract summary: We propose a variant of the multi-armed bandit where the rewards are sampled from a linear dynamical system. The proposed strategy for this multi-armed variant is to learn a model of the dynamical system while choosing the optimal action based on the learned model. This strategy is applied to quantitative finance as a high-frequency trading strategy, where the goal is to maximize returns within a time period.
Score: 2.0460959603642004
License: http://creativecommons.org/licenses/by/4.0/
Abstract: The stochastic multi-armed bandit has provided a framework for studying decision-making in unknown environments. We propose a variant of the stochastic multi-armed bandit where the rewards are sampled from a stochastic linear dynamical system. The proposed strategy for this stochastic multi-armed bandit variant is to learn a model of the dynamical system while choosing the optimal action based on the learned model. Motivated by mathematical finance areas such as Intertemporal Capital Asset Pricing Model proposed by Merton and Stochastic Portfolio Theory proposed by Fernholz that both model asset returns with stochastic differential equations, this strategy is applied to quantitative finance as a high-frequency trading strategy, where the goal is to maximize returns within a time period.

Related papers

Stochastic Q-learning for Large Discrete Action Spaces [79.1700188160944]
In complex environments with discrete action spaces, effective decision-making is critical in reinforcement learning (RL) We present value-based RL approaches which, as opposed to optimizing over the entire set of $n$ actions, only consider a variable set of actions, possibly as small as $mathcalO(log(n)$)$. The presented value-based RL methods include, among others, Q-learning, StochDQN, StochDDQN, all of which integrate this approach for both value-function updates and action selection.
arXiv Detail & Related papers (2024-05-16T17:58:44Z)
Beyond Expectations: Learning with Stochastic Dominance Made Practical [88.06211893690964]
dominance models risk-averse preferences for decision making with uncertain outcomes. Despite theoretically appealing, the application of dominance in machine learning has been scarce. We first generalize the dominance concept to enable feasible comparisons between any arbitrary pair of random variables. We then develop a simple and efficient approach for finding the optimal solution in terms of dominance.
arXiv Detail & Related papers (2024-02-05T03:21:23Z)
COPlanner: Plan to Roll Out Conservatively but to Explore Optimistically for Model-Based RL [50.385005413810084]
Dyna-style model-based reinforcement learning contains two phases: model rollouts to generate sample for policy learning and real environment exploration. $textttCOPlanner$ is a planning-driven framework for model-based methods to address the inaccurately learned dynamics model problem.
arXiv Detail & Related papers (2023-10-11T06:10:07Z)
SDYN-GANs: Adversarial Learning Methods for Multistep Generative Models for General Order Stochastic Dynamics [20.292913470013744]
We build on Generative Adversarial Networks (GANs) with generative model classes based on stable $m$-step numerical trajectory. We show how our approaches can be used for modeling physical systems to learn force-laws, damping coefficients, and noise-related parameters.
arXiv Detail & Related papers (2023-02-07T18:28:09Z)
Maximum entropy exploration in contextual bandits with neural networks and energy based models [63.872634680339644]
We present two classes of models, one with neural networks as reward estimators, and the other with energy based models. We show that both techniques outperform well-known standard algorithms, where energy based models have the best overall performance. This provides practitioners with new techniques that perform well in static and dynamic settings, and are particularly well suited to non-linear scenarios with continuous action spaces.
arXiv Detail & Related papers (2022-10-12T15:09:45Z)
Understanding the stochastic dynamics of sequential decision-making processes: A path-integral analysis of multi-armed bandits [7.05949591248206]
The multi-armed bandit (MAB) model is one of the most popular models to study decision-making in an uncertain environment. In this paper, we employ techniques in statistical physics to analyze the MAB model.
arXiv Detail & Related papers (2022-08-11T09:32:03Z)
A Variational Inference Approach to Inverse Problems with Gamma Hyperpriors [60.489902135153415]
This paper introduces a variational iterative alternating scheme for hierarchical inverse problems with gamma hyperpriors. The proposed variational inference approach yields accurate reconstruction, provides meaningful uncertainty quantification, and is easy to implement.
arXiv Detail & Related papers (2021-11-26T06:33:29Z)
Universal and data-adaptive algorithms for model selection in linear contextual bandits [52.47796554359261]
We consider the simplest non-trivial instance of model-selection: distinguishing a simple multi-armed bandit problem from a linear contextual bandit problem. We introduce new algorithms that explore in a data-adaptive manner and provide guarantees of the form $mathcalO(dalpha T1- alpha)$. Our approach extends to model selection among nested linear contextual bandits under some additional assumptions.
arXiv Detail & Related papers (2021-11-08T18:05:35Z)
Rebounding Bandits for Modeling Satiation Effects [22.92512152419544]
We introduce rebounding bandits, a multi-armed bandit setup, where satiation dynamics are modeled as time-invariant linear dynamical systems. We characterize the planning problem showing that the greedy policy is optimal when arms exhibit identical dynamics.
arXiv Detail & Related papers (2020-11-13T03:17:29Z)
Parsimonious Quantile Regression of Financial Asset Tail Dynamics via Sequential Learning [35.34574502348672]
We propose a parsimonious quantile regression framework to learn the dynamic tail behaviors of financial asset returns. Our model captures well both the time-varying characteristic and the asymmetrical heavy-tail property of financial time series.
arXiv Detail & Related papers (2020-10-16T09:35:52Z)
Improving the Robustness of Trading Strategy Backtesting with Boltzmann Machines and Generative Adversarial Networks [0.0]
This article explores the use of machine learning models to build a market generator. The underlying idea is to simulate artificial multi-dimensional financial time series, whose statistical properties are the same as those observed in the financial markets. The article proposes then a new approach for estimating the probability distribution of backtest statistics.
arXiv Detail & Related papers (2020-07-09T14:37:45Z)

This list is automatically generated from the titles and abstracts of the papers in this site.