Related papers: Asymptotically Optimal Regret for Black-Box Predict-then-Optimize

Asymptotically Optimal Regret for Black-Box Predict-then-Optimize

URL: http://arxiv.org/abs/2406.07866v1
Date: Wed, 12 Jun 2024 04:46:23 GMT
Title: Asymptotically Optimal Regret for Black-Box Predict-then-Optimize
Authors: Samuel Tan, Peter I. Frazier,
Abstract summary: We study new black-box predict-then-optimize problems that lack special structure and where we only observe the reward from the action taken. We present a novel loss function, which we call Empirical Soft Regret (ESR), designed to significantly improve reward when used in training. We also show our approach significantly outperforms state-of-the-art algorithms on real-world decision-making problems in news recommendation and personalized healthcare.
Score: 7.412445894287709
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We consider the predict-then-optimize paradigm for decision-making in which a practitioner (1) trains a supervised learning model on historical data of decisions, contexts, and rewards, and then (2) uses the resulting model to make future binary decisions for new contexts by finding the decision that maximizes the model's predicted reward. This approach is common in industry. Past analysis assumes that rewards are observed for all actions for all historical contexts, which is possible only in problems with special structure. Motivated by problems from ads targeting and recommender systems, we study new black-box predict-then-optimize problems that lack this special structure and where we only observe the reward from the action taken. We present a novel loss function, which we call Empirical Soft Regret (ESR), designed to significantly improve reward when used in training compared to classical accuracy-based metrics like mean-squared error. This loss function targets the regret achieved when taking a suboptimal decision; because the regret is generally not differentiable, we propose a differentiable "soft" regret term that allows the use of neural networks and other flexible machine learning models dependent on gradient-based training. In the particular case of paired data, we show theoretically that optimizing our loss function yields asymptotically optimal regret within the class of supervised learning models. We also show our approach significantly outperforms state-of-the-art algorithms on real-world decision-making problems in news recommendation and personalized healthcare compared to benchmark methods from contextual bandits and conditional average treatment effect estimation.

Related papers

Treatment Effect Estimation for Optimal Decision-Making [65.30942348196443]
We study optimal decision-making based on two-stage CATE estimators.<n>We propose a novel two-stage learning objective that retargets the CATE to balance CATE estimation error and decision performance.
arXiv Detail & Related papers (2025-05-19T13:24:57Z)
Online Decision-Focused Learning [63.83903681295497]
Decision-focused learning (DFL) is an increasingly popular paradigm for training predictive models whose outputs are used in decision-making tasks.<n>We investigate DFL in dynamic environments where the objective function does not evolve over time.<n>We establish bounds on the expected dynamic regret, both when decision space is a simplex and when it is a general bounded convex polytope.
arXiv Detail & Related papers (2025-05-19T10:40:30Z)
Dissecting the Impact of Model Misspecification in Data-driven Optimization [20.35205476800932]
Data-driven optimization aims to translate a machine learning model into decision-making by optimizing decisions on estimated costs. A more recent approach uses estimation-optimization integration that minimizes decision error instead of estimation error. We show how the integrated approach offers a universal double benefit'' on the top two dominating terms of regret when the underlying model is misspecified.
arXiv Detail & Related papers (2025-03-01T21:31:54Z)
Smart Predict-then-Optimize Method with Dependent Data: Risk Bounds and Calibration of Autoregression [7.369846475695131]
We present an autoregressive SPO method directly targeting the optimization problem at the decision stage. We conduct experiments to demonstrate the effectiveness of the SPO+ surrogate compared to the absolute loss and the least squares loss.
arXiv Detail & Related papers (2024-11-19T17:02:04Z)
Embedding generalization within the learning dynamics: An approach based-on sample path large deviation theory [0.0]
We consider an empirical risk perturbation based learning problem that exploits methods from continuous-time perspective. We provide an estimate in the small noise limit based on the Freidlin-Wentzell theory of large deviations. We also present a computational algorithm that solves the corresponding variational problem leading to an optimal point estimates.
arXiv Detail & Related papers (2024-08-04T23:31:35Z)
Learning Solutions of Stochastic Optimization Problems with Bayesian Neural Networks [4.202961704179733]
In many real-world settings, some of these parameters are unknown or uncertain. Recent research focuses on predicting the value of unknown parameters using available contextual features. We propose a novel framework that models uncertainty Neural Networks (BNNs) and propagates this uncertainty into the mathematical solver.
arXiv Detail & Related papers (2024-06-05T09:11:46Z)
Predict-Then-Optimize by Proxy: Learning Joint Models of Prediction and Optimization [59.386153202037086]
Predict-Then- framework uses machine learning models to predict unknown parameters of an optimization problem from features before solving. This approach can be inefficient and requires handcrafted, problem-specific rules for backpropagation through the optimization step. This paper proposes an alternative method, in which optimal solutions are learned directly from the observable features by predictive models.
arXiv Detail & Related papers (2023-11-22T01:32:06Z)
Robust Losses for Decision-Focused Learning [2.9652474178611405]
Decision-focused learning aims at training the predictive model to minimize regret by making a suboptimal decision. empirical regret can be an ineffective surrogate because empirical optimal decisions can vary substantially from expected optimal decisions. We propose three novel loss functions that approximate expected regret more robustly.
arXiv Detail & Related papers (2023-10-06T15:45:10Z)
End-to-End Learning for Stochastic Optimization: A Bayesian Perspective [9.356870107137093]
We develop a principled approach to end-to-end learning in optimization. We show that the standard end-to-end learning algorithm admits a Bayesian interpretation and trains a posterior Bayes action map. We then propose new end-to-end learning algorithms for training decision maps.
arXiv Detail & Related papers (2023-06-07T05:55:45Z)
In Search of Insights, Not Magic Bullets: Towards Demystification of the Model Selection Dilemma in Heterogeneous Treatment Effect Estimation [92.51773744318119]
This paper empirically investigates the strengths and weaknesses of different model selection criteria. We highlight that there is a complex interplay between selection strategies, candidate estimators and the data used for comparing them.
arXiv Detail & Related papers (2023-02-06T16:55:37Z)
Stochastic Methods for AUC Optimization subject to AUC-based Fairness Constraints [51.12047280149546]
A direct approach for obtaining a fair predictive model is to train the model through optimizing its prediction performance subject to fairness constraints. We formulate the training problem of a fairness-aware machine learning model as an AUC optimization problem subject to a class of AUC-based fairness constraints. We demonstrate the effectiveness of our approach on real-world data under different fairness metrics.
arXiv Detail & Related papers (2022-12-23T22:29:08Z)
Meta-Wrapper: Differentiable Wrapping Operator for User Interest Selection in CTR Prediction [97.99938802797377]
Click-through rate (CTR) prediction, whose goal is to predict the probability of the user to click on an item, has become increasingly significant in recommender systems. Recent deep learning models with the ability to automatically extract the user interest from his/her behaviors have achieved great success. We propose a novel approach under the framework of the wrapper method, which is named Meta-Wrapper.
arXiv Detail & Related papers (2022-06-28T03:28:15Z)
Learning MDPs from Features: Predict-Then-Optimize for Sequential Decision Problems by Reinforcement Learning [52.74071439183113]
We study the predict-then-optimize framework in the context of sequential decision problems (formulated as MDPs) solved via reinforcement learning. Two significant computational challenges arise in applying decision-focused learning to MDPs.
arXiv Detail & Related papers (2021-06-06T23:53:31Z)
Fast Rates for Contextual Linear Optimization [52.39202699484225]
We show that a naive plug-in approach achieves regret convergence rates that are significantly faster than methods that directly optimize downstream decision performance. Our results are overall positive for practice: predictive models are easy and fast to train using existing tools, simple to interpret, and, as we show, lead to decisions that perform very well.
arXiv Detail & Related papers (2020-11-05T18:43:59Z)

This list is automatically generated from the titles and abstracts of the papers in this site.