Doubly High-Dimensional Contextual Bandits: An Interpretable Model for
Joint Assortment-Pricing
- URL: http://arxiv.org/abs/2309.08634v1
- Date: Thu, 14 Sep 2023 00:45:36 GMT
- Title: Doubly High-Dimensional Contextual Bandits: An Interpretable Model for
Joint Assortment-Pricing
- Authors: Junhui Cai, Ran Chen, Martin J. Wainwright, Linda Zhao
- Abstract summary: Key challenges in running a retail business include how to select products to present to consumers, and how to price products to maximize revenue or profit.
We propose a joint approach to assortment-pricing based on contextual bandits.
We show at least three-fold gains in revenue or profit by our bandit method, as well as the interpretability of the latent factor models that are learned.
- Score: 24.80305303473745
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Key challenges in running a retail business include how to select products to
present to consumers (the assortment problem), and how to price products (the
pricing problem) to maximize revenue or profit. Instead of considering these
problems in isolation, we propose a joint approach to assortment-pricing based
on contextual bandits. Our model is doubly high-dimensional, in that both
context vectors and actions are allowed to take values in high-dimensional
spaces. In order to circumvent the curse of dimensionality, we propose a simple
yet flexible model that captures the interactions between covariates and
actions via a (near) low-rank representation matrix. The resulting class of
models is reasonably expressive while remaining interpretable through latent
factors, and includes various structured linear bandit and pricing models as
particular cases. We propose a computationally tractable procedure that
combines an exploration/exploitation protocol with an efficient low-rank matrix
estimator, and we prove bounds on its regret. Simulation results show that this
method has lower regret than state-of-the-art methods applied to various
standard bandit and pricing models. Real-world case studies on the
assortment-pricing problem, from an industry-leading instant noodles company to
an emerging beauty start-up, underscore the gains achievable using our method.
In each case, we show at least three-fold gains in revenue or profit by our
bandit method, as well as the interpretability of the latent factor models that
are learned.
Related papers
- Batch Ensemble for Variance Dependent Regret in Stochastic Bandits [41.95653110232677]
Efficiently trading off exploration and exploitation is one of the key challenges in online Reinforcement Learning (RL)
Inspired by practical ensemble methods, in this work we propose a simple and novel batch ensemble scheme that achieves near-optimal regret for Multi-Armed Bandits (MAB)
Our algorithm has just a single parameter namely the number of batches, and its value does not depend on distributional properties such as the scale and variance of the losses.
arXiv Detail & Related papers (2024-09-13T06:40:56Z) - Contextual Dynamic Pricing: Algorithms, Optimality, and Local Differential Privacy Constraints [10.057344315478709]
We study the contextual dynamic pricing problem where a firm sells products to $T$ sequentially arriving consumers.
We first show that the optimal regret upper bound is of order $sqrtdT$, up to a logarithmic factor.
A key insight of our theoretical result is an intrinsic connection between dynamic pricing and the contextual multi-armed bandit problem.
arXiv Detail & Related papers (2024-06-04T15:44:10Z) - Anytime Model Selection in Linear Bandits [61.97047189786905]
We develop ALEXP, which has an exponentially improved dependence on $M$ for its regret.
Our approach utilizes a novel time-uniform analysis of the Lasso, establishing a new connection between online learning and high-dimensional statistics.
arXiv Detail & Related papers (2023-07-24T15:44:30Z) - UniMatch: A Unified User-Item Matching Framework for the Multi-purpose
Merchant Marketing [27.459774494479227]
We present a unified user-item matching framework to simultaneously conduct item recommendation and user targeting with just one model.
Our framework results in significant performance gains in comparison with the state-of-the-art methods, with greatly reduced cost on computing resources and daily maintenance.
arXiv Detail & Related papers (2023-07-19T13:49:35Z) - Universal and data-adaptive algorithms for model selection in linear
contextual bandits [52.47796554359261]
We consider the simplest non-trivial instance of model-selection: distinguishing a simple multi-armed bandit problem from a linear contextual bandit problem.
We introduce new algorithms that explore in a data-adaptive manner and provide guarantees of the form $mathcalO(dalpha T1- alpha)$.
Our approach extends to model selection among nested linear contextual bandits under some additional assumptions.
arXiv Detail & Related papers (2021-11-08T18:05:35Z) - Contrastive Learning for Fair Representations [50.95604482330149]
Trained classification models can unintentionally lead to biased representations and predictions.
Existing debiasing methods for classification models, such as adversarial training, are often expensive to train and difficult to optimise.
We propose a method for mitigating bias by incorporating contrastive learning, in which instances sharing the same class label are encouraged to have similar representations.
arXiv Detail & Related papers (2021-09-22T10:47:51Z) - A Twin Neural Model for Uplift [59.38563723706796]
Uplift is a particular case of conditional treatment effect modeling.
We propose a new loss function defined by leveraging a connection with the Bayesian interpretation of the relative risk.
We show our proposed method is competitive with the state-of-the-art in simulation setting and on real data from large scale randomized experiments.
arXiv Detail & Related papers (2021-05-11T16:02:39Z) - Deep Hedging: Learning Risk-Neutral Implied Volatility Dynamics [0.0]
numerically efficient approach for learning a risk-neutral measure for paths of simulated spot and option prices.
We show that market dynamics are free from "statistical arbitrage" in the absence of transaction costs if and only if they follow a risk-neutral measure.
arXiv Detail & Related papers (2021-03-22T15:38:25Z) - Characterizing Fairness Over the Set of Good Models Under Selective
Labels [69.64662540443162]
We develop a framework for characterizing predictive fairness properties over the set of models that deliver similar overall performance.
We provide tractable algorithms to compute the range of attainable group-level predictive disparities.
We extend our framework to address the empirically relevant challenge of selectively labelled data.
arXiv Detail & Related papers (2021-01-02T02:11:37Z) - Nearly Dimension-Independent Sparse Linear Bandit over Small Action
Spaces via Best Subset Selection [71.9765117768556]
We consider the contextual bandit problem under the high dimensional linear model.
This setting finds essential applications such as personalized recommendation, online advertisement, and personalized medicine.
We propose doubly growing epochs and estimating the parameter using the best subset selection method.
arXiv Detail & Related papers (2020-09-04T04:10:39Z) - Categorized Bandits [17.865068872754293]
We introduce a new multi-armed bandit setting where arms are grouped inside ordered'' categories.
The motivating example comes from e-commerce, where a customer typically has a greater appetence for items of a specific well-identified but unknown category than any other one.
arXiv Detail & Related papers (2020-05-04T17:09:22Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.