From Contextual Data to Newsvendor Decisions: On the Actual Performance
of Data-Driven Algorithms
- URL: http://arxiv.org/abs/2302.08424v3
- Date: Thu, 27 Jul 2023 15:56:14 GMT
- Title: From Contextual Data to Newsvendor Decisions: On the Actual Performance
of Data-Driven Algorithms
- Authors: Omar Besbes, Will Ma, Omar Mouchtaki
- Abstract summary: We study how the relevance and quantity of past data affects the performance of a data-driven policy.
We consider a setting in which past demands observed under close by'' contexts come from close by distributions.
- Score: 2.9603743540540357
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In this work, we explore a framework for contextual decision-making to study
how the relevance and quantity of past data affects the performance of a
data-driven policy. We analyze a contextual Newsvendor problem in which a
decision-maker needs to trade-off between an underage and an overage cost in
the face of uncertain demand. We consider a setting in which past demands
observed under ``close by'' contexts come from close by distributions and
analyze the performance of data-driven algorithms through a notion of
context-dependent worst-case expected regret. We analyze the broad class of
Weighted Empirical Risk Minimization (WERM) policies which weigh past data
according to their similarity in the contextual space. This class includes
classical policies such as ERM, k-Nearest Neighbors and kernel-based policies.
Our main methodological contribution is to characterize exactly the worst-case
regret of any WERM policy on any given configuration of contexts. To the best
of our knowledge, this provides the first understanding of tight performance
guarantees in any contextual decision-making problem, with past literature
focusing on upper bounds via concentration inequalities. We instead take an
optimization approach, and isolate a structure in the Newsvendor loss function
that allows to reduce the infinite-dimensional optimization problem over
worst-case distributions to a simple line search.
This in turn allows us to unveil fundamental insights that were obfuscated by
previous general-purpose bounds. We characterize actual guaranteed performance
as a function of the contexts, as well as granular insights on the learning
curve of algorithms.
Related papers
- Generalization Bounds of Surrogate Policies for Combinatorial Optimization Problems [61.580419063416734]
A recent stream of structured learning approaches has improved the practical state of the art for a range of optimization problems.
The key idea is to exploit the statistical distribution over instances instead of dealing with instances separately.
In this article, we investigate methods that smooth the risk by perturbing the policy, which eases optimization and improves the generalization error.
arXiv Detail & Related papers (2024-07-24T12:00:30Z) - Importance-Weighted Offline Learning Done Right [16.4989952150404]
We study the problem of offline policy optimization in contextual bandit problems.
The goal is to learn a near-optimal policy based on a dataset of decision data collected by a suboptimal behavior policy.
We show that a simple alternative approach based on the "implicit exploration" estimator of citet2015 yields performance guarantees that are superior in nearly all possible terms to all previous results.
arXiv Detail & Related papers (2023-09-27T16:42:10Z) - Provably Efficient Learning in Partially Observable Contextual Bandit [4.910658441596583]
We show how causal bounds can be applied to improving classical bandit algorithms.
This research has the potential to enhance the performance of contextual bandit agents in real-world applications.
arXiv Detail & Related papers (2023-08-07T13:24:50Z) - Online learning in bandits with predicted context [8.257280652461159]
We consider the contextual bandit problem where at each time, the agent only has access to a noisy version of the context.
This setting is motivated by a wide range of applications where the true context for decision-making is unobserved.
We propose the first online algorithm in this setting with sublinear regret guarantees under mild conditions.
arXiv Detail & Related papers (2023-07-26T02:33:54Z) - Robust Data-driven Prescriptiveness Optimization [4.792851066169871]
This paper introduces a distributionally robust contextual optimization model where the coefficient of prescriptiveness substitutes for the classical empirical risk objective minimization.
We evaluate the robustness of the resulting policies against alternative methods when the out-of-sample dataset is subject to varying amounts of distribution shift.
arXiv Detail & Related papers (2023-06-09T14:56:06Z) - A Unified Framework of Policy Learning for Contextual Bandit with
Confounding Bias and Missing Observations [108.89353070722497]
We study the offline contextual bandit problem, where we aim to acquire an optimal policy using observational data.
We present a new algorithm called Causal-Adjusted Pessimistic (CAP) policy learning, which forms the reward function as the solution of an integral equation system.
arXiv Detail & Related papers (2023-03-20T15:17:31Z) - Contextual Model Aggregation for Fast and Robust Federated Learning in
Edge Computing [88.76112371510999]
Federated learning is a prime candidate for distributed machine learning at the network edge.
Existing algorithms face issues with slow convergence and/or robustness of performance.
We propose a contextual aggregation scheme that achieves the optimal context-dependent bound on loss reduction.
arXiv Detail & Related papers (2022-03-23T21:42:31Z) - Off-Policy Evaluation with Policy-Dependent Optimization Response [90.28758112893054]
We develop a new framework for off-policy evaluation with a textitpolicy-dependent linear optimization response.
We construct unbiased estimators for the policy-dependent estimand by a perturbation method.
We provide a general algorithm for optimizing causal interventions.
arXiv Detail & Related papers (2022-02-25T20:25:37Z) - Risk Minimization from Adaptively Collected Data: Guarantees for
Supervised and Policy Learning [57.88785630755165]
Empirical risk minimization (ERM) is the workhorse of machine learning, but its model-agnostic guarantees can fail when we use adaptively collected data.
We study a generic importance sampling weighted ERM algorithm for using adaptively collected data to minimize the average of a loss function over a hypothesis class.
For policy learning, we provide rate-optimal regret guarantees that close an open gap in the existing literature whenever exploration decays to zero.
arXiv Detail & Related papers (2021-06-03T09:50:13Z) - Privacy-Constrained Policies via Mutual Information Regularized Policy Gradients [54.98496284653234]
We consider the task of training a policy that maximizes reward while minimizing disclosure of certain sensitive state variables through the actions.
We solve this problem by introducing a regularizer based on the mutual information between the sensitive state and the actions.
We develop a model-based estimator for optimization of privacy-constrained policies.
arXiv Detail & Related papers (2020-12-30T03:22:35Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.