Related papers: Risk-Aware Linear Bandits: Theory and Applications in Smart Order Routing

Risk-Aware Linear Bandits: Theory and Applications in Smart Order Routing

URL: http://arxiv.org/abs/2208.02389v2
Date: Tue, 23 Jan 2024 22:32:18 GMT
Title: Risk-Aware Linear Bandits: Theory and Applications in Smart Order Routing
Authors: Jingwei Ji, Renyuan Xu, Ruihao Zhu
Abstract summary: We consider risk-aware bandits optimization with applications in smart order routing (SOR) Driven by the variance-minimizing globally-optimal (G-optimal) design, we propose the novel instance-independent Risk-Aware Explore-then-Commit (RISE) algorithm and the instance-dependent Risk-Aware Successive Elimination (RISE++) algorithm.
Score: 10.69955834942979
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: Motivated by practical considerations in machine learning for financial decision-making, such as risk aversion and large action space, we consider risk-aware bandits optimization with applications in smart order routing (SOR). Specifically, based on preliminary observations of linear price impacts made from the NASDAQ ITCH dataset, we initiate the study of risk-aware linear bandits. In this setting, we aim at minimizing regret, which measures our performance deficit compared to the optimum's, under the mean-variance metric when facing a set of actions whose rewards are linear functions of (initially) unknown parameters. Driven by the variance-minimizing globally-optimal (G-optimal) design, we propose the novel instance-independent Risk-Aware Explore-then-Commit (RISE) algorithm and the instance-dependent Risk-Aware Successive Elimination (RISE++) algorithm. Then, we rigorously analyze their near-optimal regret upper bounds to show that, by leveraging the linear structure, our algorithms can dramatically reduce the regret when compared to existing methods. Finally, we demonstrate the performance of the algorithms by conducting extensive numerical experiments in the SOR setup using both synthetic datasets and the NASDAQ ITCH dataset. Our results reveal that 1) The linear structure assumption can indeed be well supported by the Nasdaq dataset; and more importantly 2) Both RISE and RISE++ can significantly outperform the competing methods, in terms of regret, especially in complex decision-making scenarios.

Related papers

Efficient Risk-sensitive Planning via Entropic Risk Measures [51.42922439693624]
We show that only Entropic Risk Measures (EntRM) can be efficiently optimized through dynamic programming. We prove that this optimality front can be computed effectively thanks to a novel structural analysis and smoothness properties of entropic risks.
arXiv Detail & Related papers (2025-02-27T09:56:51Z)
On the Efficiency of ERM in Feature Learning [31.277788690403522]
We study the performance of empirical risk minimization on regression problems with square loss over the union of the linear classes induced by feature maps. We show that when the set $mathcalT$ is not too large and when there is a unique optimal feature map, these quantiles coincide, up to a factor of two, with those of the excess risk of the oracle procedure. We obtain new guarantees on the performance of the best subset selection procedure in sparse linear regression under general assumptions.
arXiv Detail & Related papers (2024-11-18T20:05:05Z)
Pessimism Meets Risk: Risk-Sensitive Offline Reinforcement Learning [19.292214425524303]
We study risk-sensitive reinforcement learning (RL), a crucial field due to its ability to enhance decision-making in scenarios where it is essential to manage uncertainty and minimize potential adverse outcomes. Our work focuses on applying the entropic risk measure to RL problems. We center on the linear Markov Decision Process (MDP) setting, a well-regarded theoretical framework that has yet to be examined from a risk-sensitive standpoint.
arXiv Detail & Related papers (2024-07-10T13:09:52Z)
Optimizing Sharpe Ratio: Risk-Adjusted Decision-Making in Multi-Armed Bandits [3.5502600490147196]
We consider the Sharpe Ratio (SR) as a critical parameter in characterizing financial time series. We propose a novel algorithm for optimizing the SR called UCB- RSSR for Regret Minimization (RM) and Best Arm Identification (BAI) We demonstrate that UCB- RSSR outperforms the only other known SR optimizing bandit algorithm, U-UCB Cassel et al (2023)
arXiv Detail & Related papers (2024-05-28T14:24:36Z)
Model-Based Epistemic Variance of Values for Risk-Aware Policy Optimization [59.758009422067]
We consider the problem of quantifying uncertainty over expected cumulative rewards in model-based reinforcement learning. We propose a new uncertainty Bellman equation (UBE) whose solution converges to the true posterior variance over values. We introduce a general-purpose policy optimization algorithm, Q-Uncertainty Soft Actor-Critic (QU-SAC) that can be applied for either risk-seeking or risk-averse policy optimization.
arXiv Detail & Related papers (2023-12-07T15:55:58Z)
Risk-sensitive Markov Decision Process and Learning under General Utility Functions [3.6260136172126667]
Reinforcement Learning (RL) has gained substantial attention across diverse application domains and theoretical investigations. We propose a modified value algorithm that employs an epsilon-covering over the space of cumulative reward. In the absence of a simulator, our algorithm, designed with an upper-confidence-bound exploration approach, identifies a near-optimal policy.
arXiv Detail & Related papers (2023-11-22T18:50:06Z)
Large-Scale OD Matrix Estimation with A Deep Learning Method [70.78575952309023]
The proposed method integrates deep learning and numerical optimization algorithms to infer matrix structure and guide numerical optimization. We conducted tests to demonstrate the good generalization performance of our method on a large-scale synthetic dataset.
arXiv Detail & Related papers (2023-10-09T14:30:06Z)
Empirical Risk Minimization for Losses without Variance [26.30435936379624]
This paper considers an empirical risk problem under heavy-tailed settings, where data does not have finite variance, but only has $p$-th moment with $p in (1,2)$. Instead of using estimation procedure based on truncated observed data, we choose the minimization by minimizing the risk value. Those risk values can be robustly estimated via using the remarkable Catoni's method (Catoni, 2012).
arXiv Detail & Related papers (2023-09-07T16:14:00Z)
Active Learning in the Predict-then-Optimize Framework: A Margin-Based Approach [5.371816551086118]
We develop a learning method that sequentially decides whether to request the "labels" of feature samples from an unlabeled data stream. Our active learning method is the first to be directly informed by the decision error induced by the predicted parameters.
arXiv Detail & Related papers (2023-05-11T05:44:36Z)
Algorithmic Foundations of Empirical X-risk Minimization [51.58884973792057]
This manuscript introduces a new optimization framework machine learning and AI, named bf empirical X-risk baseline (EXM). X-risk is a term introduced to represent a family of compositional measures or objectives.
arXiv Detail & Related papers (2022-06-01T12:22:56Z)
Safe Online Bid Optimization with Return-On-Investment and Budget Constraints subject to Uncertainty [87.81197574939355]
We study the nature of both the optimization and learning problems. We provide an algorithm, namely GCB, guaranteeing sublinear regret at the cost of a potentially linear number of constraints violations. More interestingly, we provide an algorithm, namely GCB_safe(psi,phi), guaranteeing both sublinear pseudo-regret and safety w.h.p. at the cost of accepting tolerances psi and phi.
arXiv Detail & Related papers (2022-01-18T17:24:20Z)
Stochastic Hard Thresholding Algorithms for AUC Maximization [49.00683387735522]
We develop a hard thresholding algorithm for AUC in distributiond classification. We conduct experiments to show the efficiency and effectiveness of the proposed algorithms.
arXiv Detail & Related papers (2020-11-04T16:49:29Z)
SAMBA: Safe Model-Based & Active Reinforcement Learning [59.01424351231993]
SAMBA is a framework for safe reinforcement learning that combines aspects from probabilistic modelling, information theory, and statistics. We evaluate our algorithm on a variety of safe dynamical system benchmarks involving both low and high-dimensional state representations. We provide intuition as to the effectiveness of the framework by a detailed analysis of our active metrics and safety constraints.
arXiv Detail & Related papers (2020-06-12T10:40:46Z)

This list is automatically generated from the titles and abstracts of the papers in this site.