Statistical Learning with Sublinear Regret of Propagator Models
- URL: http://arxiv.org/abs/2301.05157v1
- Date: Thu, 12 Jan 2023 17:16:27 GMT
- Title: Statistical Learning with Sublinear Regret of Propagator Models
- Authors: Eyal Neuman, Yufei Zhang
- Abstract summary: We consider a class of learning problems in which an agent liquidates a risky asset while creating both transient impact price driven by an unknown convolution propagator and linear temporary impact price with an unknown parameter.
We present a trading algorithm that alternates between exploration and exploitation and sublinear sublinear regrets with high probability.
- Score: 2.9628715114493502
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We consider a class of learning problems in which an agent liquidates a risky
asset while creating both transient price impact driven by an unknown
convolution propagator and linear temporary price impact with an unknown
parameter. We characterize the trader's performance as maximization of a
revenue-risk functional, where the trader also exploits available information
on a price predicting signal. We present a trading algorithm that alternates
between exploration and exploitation phases and achieves sublinear regrets with
high probability. For the exploration phase we propose a novel approach for
non-parametric estimation of the price impact kernel by observing only the
visible price process and derive sharp bounds on the convergence rate, which
are characterised by the singularity of the propagator. These kernel estimation
methods extend existing methods from the area of Tikhonov regularisation for
inverse problems and are of independent interest. The bound on the regret in
the exploitation phase is obtained by deriving stability results for the
optimizer and value function of the associated class of infinite-dimensional
stochastic control problems. As a complementary result we propose a
regression-based algorithm to estimate the conditional expectation of
non-Markovian signals and derive its convergence rate.
Related papers
- Continuous-time Risk-sensitive Reinforcement Learning via Quadratic Variation Penalty [5.710971447109951]
This paper studies continuous-time risk-sensitive reinforcement learning (RL)
I highlight that the conventional policy gradient representation is inadequate for risk-sensitive problems due to the nonlinear nature of quadratic variation.
I prove the convergence of the proposed algorithm for Merton's investment problem and quantify the impact of temperature parameter on the behavior of the learning procedure.
arXiv Detail & Related papers (2024-04-19T03:05:41Z) - Model-Based Epistemic Variance of Values for Risk-Aware Policy Optimization [59.758009422067]
We consider the problem of quantifying uncertainty over expected cumulative rewards in model-based reinforcement learning.
We propose a new uncertainty Bellman equation (UBE) whose solution converges to the true posterior variance over values.
We introduce a general-purpose policy optimization algorithm, Q-Uncertainty Soft Actor-Critic (QU-SAC) that can be applied for either risk-seeking or risk-averse policy optimization.
arXiv Detail & Related papers (2023-12-07T15:55:58Z) - An Offline Learning Approach to Propagator Models [3.1755820123640612]
We consider an offline learning problem for an agent who first estimates an unknown price impact kernel from a static dataset.
We propose a novel approach for a nonparametric estimation of the propagator from a dataset containing correlated price trajectories, trading signals and metaorders.
We show that a trader who tries to minimise her execution costs by using a greedy strategy purely based on the estimated propagator will encounter suboptimality.
arXiv Detail & Related papers (2023-09-06T13:36:43Z) - Provable Guarantees for Generative Behavior Cloning: Bridging Low-Level
Stability and High-Level Behavior [51.60683890503293]
We propose a theoretical framework for studying behavior cloning of complex expert demonstrations using generative modeling.
We show that pure supervised cloning can generate trajectories matching the per-time step distribution of arbitrary expert trajectories.
arXiv Detail & Related papers (2023-07-27T04:27:26Z) - Nonparametric Linear Feature Learning in Regression Through Regularisation [0.0]
We propose a novel method for joint linear feature learning and non-parametric function estimation.
By using alternative minimisation, we iteratively rotate the data to improve alignment with leading directions.
We establish that the expected risk of our method converges to the minimal risk under minimal assumptions and with explicit rates.
arXiv Detail & Related papers (2023-07-24T12:52:55Z) - A Tale of Sampling and Estimation in Discounted Reinforcement Learning [50.43256303670011]
We present a minimax lower bound on the discounted mean estimation problem.
We show that estimating the mean by directly sampling from the discounted kernel of the Markov process brings compelling statistical properties.
arXiv Detail & Related papers (2023-04-11T09:13:17Z) - Model-Based Uncertainty in Value Functions [89.31922008981735]
We focus on characterizing the variance over values induced by a distribution over MDPs.
Previous work upper bounds the posterior variance over values by solving a so-called uncertainty Bellman equation.
We propose a new uncertainty Bellman equation whose solution converges to the true posterior variance over values.
arXiv Detail & Related papers (2023-02-24T09:18:27Z) - Optimal variance-reduced stochastic approximation in Banach spaces [114.8734960258221]
We study the problem of estimating the fixed point of a contractive operator defined on a separable Banach space.
We establish non-asymptotic bounds for both the operator defect and the estimation error.
arXiv Detail & Related papers (2022-01-21T02:46:57Z) - Exploration-exploitation trade-off for continuous-time episodic
reinforcement learning with linear-convex models [2.503869683354711]
We study finite-time horizon control problems with linear dynamics but unknown coefficients and convex, but possibly irregular, objective function.
We identify conditions under which this performance gap is quadratic, improving the linear performance gap in recent work.
Next, we propose a phase-based learning algorithm for which we show how to optimise exploration-exploitation trade-off and achieve sublinear regrets.
arXiv Detail & Related papers (2021-12-19T21:47:04Z) - Orthogonal Statistical Learning [49.55515683387805]
We provide non-asymptotic excess risk guarantees for statistical learning in a setting where the population risk depends on an unknown nuisance parameter.
We show that if the population risk satisfies a condition called Neymanity, the impact of the nuisance estimation error on the excess risk bound achieved by the meta-algorithm is of second order.
arXiv Detail & Related papers (2019-01-25T02:21:24Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.