Risk-Aware Linear Bandits: Theory and Applications in Smart Order
Routing
- URL: http://arxiv.org/abs/2208.02389v2
- Date: Tue, 23 Jan 2024 22:32:18 GMT
- Title: Risk-Aware Linear Bandits: Theory and Applications in Smart Order
Routing
- Authors: Jingwei Ji, Renyuan Xu, Ruihao Zhu
- Abstract summary: We consider risk-aware bandits optimization with applications in smart order routing (SOR)
Driven by the variance-minimizing globally-optimal (G-optimal) design, we propose the novel instance-independent Risk-Aware Explore-then-Commit (RISE) algorithm and the instance-dependent Risk-Aware Successive Elimination (RISE++) algorithm.
- Score: 10.69955834942979
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Motivated by practical considerations in machine learning for financial
decision-making, such as risk aversion and large action space, we consider
risk-aware bandits optimization with applications in smart order routing (SOR).
Specifically, based on preliminary observations of linear price impacts made
from the NASDAQ ITCH dataset, we initiate the study of risk-aware linear
bandits. In this setting, we aim at minimizing regret, which measures our
performance deficit compared to the optimum's, under the mean-variance metric
when facing a set of actions whose rewards are linear functions of (initially)
unknown parameters. Driven by the variance-minimizing globally-optimal
(G-optimal) design, we propose the novel instance-independent Risk-Aware
Explore-then-Commit (RISE) algorithm and the instance-dependent Risk-Aware
Successive Elimination (RISE++) algorithm. Then, we rigorously analyze their
near-optimal regret upper bounds to show that, by leveraging the linear
structure, our algorithms can dramatically reduce the regret when compared to
existing methods. Finally, we demonstrate the performance of the algorithms by
conducting extensive numerical experiments in the SOR setup using both
synthetic datasets and the NASDAQ ITCH dataset. Our results reveal that 1) The
linear structure assumption can indeed be well supported by the Nasdaq dataset;
and more importantly 2) Both RISE and RISE++ can significantly outperform the
competing methods, in terms of regret, especially in complex decision-making
scenarios.
Related papers
- Pessimism Meets Risk: Risk-Sensitive Offline Reinforcement Learning [19.292214425524303]
We study risk-sensitive reinforcement learning (RL), a crucial field due to its ability to enhance decision-making in scenarios where it is essential to manage uncertainty and minimize potential adverse outcomes.
Our work focuses on applying the entropic risk measure to RL problems.
We center on the linear Markov Decision Process (MDP) setting, a well-regarded theoretical framework that has yet to be examined from a risk-sensitive standpoint.
arXiv Detail & Related papers (2024-07-10T13:09:52Z) - Optimizing Sharpe Ratio: Risk-Adjusted Decision-Making in Multi-Armed Bandits [3.5502600490147196]
We consider the Sharpe Ratio (SR) as a critical parameter in characterizing financial time series.
We propose a novel algorithm for optimizing the SR called UCB- RSSR for Regret Minimization (RM) and Best Arm Identification (BAI)
We demonstrate that UCB- RSSR outperforms the only other known SR optimizing bandit algorithm, U-UCB Cassel et al (2023)
arXiv Detail & Related papers (2024-05-28T14:24:36Z) - Model-Based Epistemic Variance of Values for Risk-Aware Policy Optimization [59.758009422067]
We consider the problem of quantifying uncertainty over expected cumulative rewards in model-based reinforcement learning.
We propose a new uncertainty Bellman equation (UBE) whose solution converges to the true posterior variance over values.
We introduce a general-purpose policy optimization algorithm, Q-Uncertainty Soft Actor-Critic (QU-SAC) that can be applied for either risk-seeking or risk-averse policy optimization.
arXiv Detail & Related papers (2023-12-07T15:55:58Z) - Risk-sensitive Markov Decision Process and Learning under General
Utility Functions [3.6260136172126667]
Reinforcement Learning (RL) has gained substantial attention across diverse application domains and theoretical investigations.
We propose a modified value algorithm that employs an epsilon-covering over the space of cumulative reward.
In the absence of a simulator, our algorithm, designed with an upper-confidence-bound exploration approach, identifies a near-optimal policy.
arXiv Detail & Related papers (2023-11-22T18:50:06Z) - Large-Scale OD Matrix Estimation with A Deep Learning Method [70.78575952309023]
The proposed method integrates deep learning and numerical optimization algorithms to infer matrix structure and guide numerical optimization.
We conducted tests to demonstrate the good generalization performance of our method on a large-scale synthetic dataset.
arXiv Detail & Related papers (2023-10-09T14:30:06Z) - Empirical Risk Minimization for Losses without Variance [26.30435936379624]
This paper considers an empirical risk problem under heavy-tailed settings, where data does not have finite variance, but only has $p$-th moment with $p in (1,2)$.
Instead of using estimation procedure based on truncated observed data, we choose the minimization by minimizing the risk value.
Those risk values can be robustly estimated via using the remarkable Catoni's method (Catoni, 2012).
arXiv Detail & Related papers (2023-09-07T16:14:00Z) - Active Learning in the Predict-then-Optimize Framework: A Margin-Based
Approach [5.371816551086118]
We develop a learning method that sequentially decides whether to request the "labels" of feature samples from an unlabeled data stream.
Our active learning method is the first to be directly informed by the decision error induced by the predicted parameters.
arXiv Detail & Related papers (2023-05-11T05:44:36Z) - Algorithmic Foundations of Empirical X-risk Minimization [51.58884973792057]
This manuscript introduces a new optimization framework machine learning and AI, named bf empirical X-risk baseline (EXM).
X-risk is a term introduced to represent a family of compositional measures or objectives.
arXiv Detail & Related papers (2022-06-01T12:22:56Z) - Safe Online Bid Optimization with Return-On-Investment and Budget
Constraints subject to Uncertainty [87.81197574939355]
We study the nature of both the optimization and learning problems.
We provide an algorithm, namely GCB, guaranteeing sublinear regret at the cost of a potentially linear number of constraints violations.
More interestingly, we provide an algorithm, namely GCB_safe(psi,phi), guaranteeing both sublinear pseudo-regret and safety w.h.p. at the cost of accepting tolerances psi and phi.
arXiv Detail & Related papers (2022-01-18T17:24:20Z) - Stochastic Hard Thresholding Algorithms for AUC Maximization [49.00683387735522]
We develop a hard thresholding algorithm for AUC in distributiond classification.
We conduct experiments to show the efficiency and effectiveness of the proposed algorithms.
arXiv Detail & Related papers (2020-11-04T16:49:29Z) - SAMBA: Safe Model-Based & Active Reinforcement Learning [59.01424351231993]
SAMBA is a framework for safe reinforcement learning that combines aspects from probabilistic modelling, information theory, and statistics.
We evaluate our algorithm on a variety of safe dynamical system benchmarks involving both low and high-dimensional state representations.
We provide intuition as to the effectiveness of the framework by a detailed analysis of our active metrics and safety constraints.
arXiv Detail & Related papers (2020-06-12T10:40:46Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.