Asymptotically Optimal Regret for Black-Box Predict-then-Optimize
- URL: http://arxiv.org/abs/2406.07866v1
- Date: Wed, 12 Jun 2024 04:46:23 GMT
- Title: Asymptotically Optimal Regret for Black-Box Predict-then-Optimize
- Authors: Samuel Tan, Peter I. Frazier,
- Abstract summary: We study new black-box predict-then-optimize problems that lack special structure and where we only observe the reward from the action taken.
We present a novel loss function, which we call Empirical Soft Regret (ESR), designed to significantly improve reward when used in training.
We also show our approach significantly outperforms state-of-the-art algorithms on real-world decision-making problems in news recommendation and personalized healthcare.
- Score: 7.412445894287709
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We consider the predict-then-optimize paradigm for decision-making in which a practitioner (1) trains a supervised learning model on historical data of decisions, contexts, and rewards, and then (2) uses the resulting model to make future binary decisions for new contexts by finding the decision that maximizes the model's predicted reward. This approach is common in industry. Past analysis assumes that rewards are observed for all actions for all historical contexts, which is possible only in problems with special structure. Motivated by problems from ads targeting and recommender systems, we study new black-box predict-then-optimize problems that lack this special structure and where we only observe the reward from the action taken. We present a novel loss function, which we call Empirical Soft Regret (ESR), designed to significantly improve reward when used in training compared to classical accuracy-based metrics like mean-squared error. This loss function targets the regret achieved when taking a suboptimal decision; because the regret is generally not differentiable, we propose a differentiable "soft" regret term that allows the use of neural networks and other flexible machine learning models dependent on gradient-based training. In the particular case of paired data, we show theoretically that optimizing our loss function yields asymptotically optimal regret within the class of supervised learning models. We also show our approach significantly outperforms state-of-the-art algorithms on real-world decision-making problems in news recommendation and personalized healthcare compared to benchmark methods from contextual bandits and conditional average treatment effect estimation.
Related papers
- Learning Solutions of Stochastic Optimization Problems with Bayesian Neural Networks [4.202961704179733]
In many real-world settings, some of these parameters are unknown or uncertain.
Recent research focuses on predicting the value of unknown parameters using available contextual features.
We propose a novel framework that models uncertainty Neural Networks (BNNs) and propagates this uncertainty into the mathematical solver.
arXiv Detail & Related papers (2024-06-05T09:11:46Z) - Predict-Then-Optimize by Proxy: Learning Joint Models of Prediction and
Optimization [59.386153202037086]
Predict-Then- framework uses machine learning models to predict unknown parameters of an optimization problem from features before solving.
This approach can be inefficient and requires handcrafted, problem-specific rules for backpropagation through the optimization step.
This paper proposes an alternative method, in which optimal solutions are learned directly from the observable features by predictive models.
arXiv Detail & Related papers (2023-11-22T01:32:06Z) - Robust Losses for Decision-Focused Learning [3.3326409357902245]
Decision-focused learning approaches are proposed to minimize regret in suboptimal decisions.
In this paper, we evaluate the effect of aleatoric uncertainty on the accuracy of empirical regret as a surrogate.
arXiv Detail & Related papers (2023-10-06T15:45:10Z) - End-to-End Learning for Stochastic Optimization: A Bayesian Perspective [9.356870107137093]
We develop a principled approach to end-to-end learning in optimization.
We show that the standard end-to-end learning algorithm admits a Bayesian interpretation and trains a posterior Bayes action map.
We then propose new end-to-end learning algorithms for training decision maps.
arXiv Detail & Related papers (2023-06-07T05:55:45Z) - In Search of Insights, Not Magic Bullets: Towards Demystification of the
Model Selection Dilemma in Heterogeneous Treatment Effect Estimation [92.51773744318119]
This paper empirically investigates the strengths and weaknesses of different model selection criteria.
We highlight that there is a complex interplay between selection strategies, candidate estimators and the data used for comparing them.
arXiv Detail & Related papers (2023-02-06T16:55:37Z) - Stochastic Methods for AUC Optimization subject to AUC-based Fairness
Constraints [51.12047280149546]
A direct approach for obtaining a fair predictive model is to train the model through optimizing its prediction performance subject to fairness constraints.
We formulate the training problem of a fairness-aware machine learning model as an AUC optimization problem subject to a class of AUC-based fairness constraints.
We demonstrate the effectiveness of our approach on real-world data under different fairness metrics.
arXiv Detail & Related papers (2022-12-23T22:29:08Z) - Meta-Wrapper: Differentiable Wrapping Operator for User Interest
Selection in CTR Prediction [97.99938802797377]
Click-through rate (CTR) prediction, whose goal is to predict the probability of the user to click on an item, has become increasingly significant in recommender systems.
Recent deep learning models with the ability to automatically extract the user interest from his/her behaviors have achieved great success.
We propose a novel approach under the framework of the wrapper method, which is named Meta-Wrapper.
arXiv Detail & Related papers (2022-06-28T03:28:15Z) - Learning MDPs from Features: Predict-Then-Optimize for Sequential
Decision Problems by Reinforcement Learning [52.74071439183113]
We study the predict-then-optimize framework in the context of sequential decision problems (formulated as MDPs) solved via reinforcement learning.
Two significant computational challenges arise in applying decision-focused learning to MDPs.
arXiv Detail & Related papers (2021-06-06T23:53:31Z) - Fast Rates for Contextual Linear Optimization [52.39202699484225]
We show that a naive plug-in approach achieves regret convergence rates that are significantly faster than methods that directly optimize downstream decision performance.
Our results are overall positive for practice: predictive models are easy and fast to train using existing tools, simple to interpret, and, as we show, lead to decisions that perform very well.
arXiv Detail & Related papers (2020-11-05T18:43:59Z) - Robust priors for regularized regression [12.945710636153537]
Penalized regression approaches like ridge regression shrink toward zero but zero weights is usually not a sensible prior.
Inspired by simple and robust decisions humans use, we constructed non-zero priors for penalized regression models.
Models with robust priors had excellent worst-case performance.
arXiv Detail & Related papers (2020-10-06T10:43:14Z) - Persistent Neurons [4.061135251278187]
We propose a trajectory-based strategy that optimize the learning task using information from previous solutions.
Persistent neurons can be regarded as a method with gradient informed bias where individual updates are corrupted by deterministic error terms.
We evaluate the full and partial persistent model and show it can be used to boost the performance on a range of NN structures.
arXiv Detail & Related papers (2020-07-02T22:36:49Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.