Related papers: Decision-Focused Learning with Directional Gradients

Decision-Focused Learning with Directional Gradients

URL: http://arxiv.org/abs/2402.03256v4
Date: Wed, 30 Oct 2024 22:01:13 GMT
Title: Decision-Focused Learning with Directional Gradients
Authors: Michael Huang, Vishal Gupta,
Abstract summary: We propose a novel family of decision-aware surrogate losses, called Perturbation Gradient (PG) losses, for the predict-then-optimize framework. Unlike the original decision loss which is typically piecewise constant and discontinuous, our new PG losses is a Lipschitz continuous, difference of concave functions. We provide numerical evidence confirming our PG losses substantively outperform existing proposals when the underlying model is misspecified.
Score: 1.2363103948638432
License: http://creativecommons.org/licenses/by/4.0/
Abstract: We propose a novel family of decision-aware surrogate losses, called Perturbation Gradient (PG) losses, for the predict-then-optimize framework. The key idea is to connect the expected downstream decision loss with the directional derivative of a particular plug-in objective, and then approximate this derivative using zeroth order gradient techniques. Unlike the original decision loss which is typically piecewise constant and discontinuous, our new PG losses is a Lipschitz continuous, difference of concave functions that can be optimized using off-the-shelf gradient-based methods. Most importantly, unlike existing surrogate losses, the approximation error of our PG losses vanishes as the number of samples grows. Hence, optimizing our surrogate loss yields a best-in-class policy asymptotically, even in misspecified settings. This is the first such result in misspecified settings, and we provide numerical evidence confirming our PG losses substantively outperform existing proposals when the underlying model is misspecified.

Related papers

Decision from Suboptimal Classifiers: Excess Risk Pre- and Post-Calibration [52.70324949884702]
We quantify the excess risk incurred using approximate posterior probabilities in batch binary decision-making. We identify regimes where recalibration alone addresses most of the regret, and regimes where the regret is dominated by the grouping loss. On NLP experiments, we show that these quantities identify when the expected gain of more advanced post-training is worth the operational cost.
arXiv Detail & Related papers (2025-03-23T10:52:36Z)
Smart Predict-then-Optimize Method with Dependent Data: Risk Bounds and Calibration of Autoregression [7.369846475695131]
We present an autoregressive SPO method directly targeting the optimization problem at the decision stage. We conduct experiments to demonstrate the effectiveness of the SPO+ surrogate compared to the absolute loss and the least squares loss.
arXiv Detail & Related papers (2024-11-19T17:02:04Z)
Refined Risk Bounds for Unbounded Losses via Transductive Priors [58.967816314671296]
We revisit the sequential variants of linear regression with the squared loss, classification problems with hinge loss, and logistic regression. Our key tools are based on the exponential weights algorithm with carefully chosen transductive priors.
arXiv Detail & Related papers (2024-10-29T00:01:04Z)
Uncertainty-Penalized Direct Preference Optimization [52.387088396044206]
We develop a pessimistic framework for DPO by introducing preference uncertainty penalization schemes. The penalization serves as a correction to the loss which attenuates the loss gradient for uncertain samples. We show improved overall performance compared to vanilla DPO, as well as better completions on prompts from high-uncertainty chosen/rejected responses.
arXiv Detail & Related papers (2024-10-26T14:24:37Z)
LEARN: An Invex Loss for Outlier Oblivious Robust Online Optimization [56.67706781191521]
An adversary can introduce outliers by corrupting loss functions in an arbitrary number of k, unknown to the learner. We present a robust online rounds optimization framework, where an adversary can introduce outliers by corrupting loss functions in an arbitrary number of k, unknown.
arXiv Detail & Related papers (2024-08-12T17:08:31Z)
Inference with non-differentiable surrogate loss in a general high-dimensional classification framework [4.792322531593389]
We propose a kernel-smoothed decorrelated score to construct hypothesis testing and interval estimations. Specifically, we adopt kernel approximations to smooth the discontinuous gradient near discontinuity points. We establish the limiting distribution of the kernel-smoothed decorrelated score and its cross-fitted version in a high-dimensional setup.
arXiv Detail & Related papers (2024-05-20T01:50:35Z)
Reducing Predictive Feature Suppression in Resource-Constrained Contrastive Image-Caption Retrieval [65.33981533521207]
We introduce an approach to reduce predictive feature suppression for resource-constrained ICR methods: latent target decoding (LTD) LTD reconstructs the input caption in a latent space of a general-purpose sentence encoder, which prevents the image and caption encoder from suppressing predictive features. Our experiments show that, unlike reconstructing the input caption in the input space, LTD reduces predictive feature suppression, measured by obtaining higher recall@k, r-precision, and nDCG scores.
arXiv Detail & Related papers (2022-04-28T09:55:28Z)
The Devil is in the Margin: Margin-based Label Smoothing for Network Calibration [21.63888208442176]
In spite of the dominant performances of deep neural networks, recent works have shown that they are poorly calibrated. We provide a unifying constrained-optimization perspective of current state-of-the-art calibration losses. We propose a simple and flexible generalization based on inequality constraints, which imposes a controllable margin on logit distances.
arXiv Detail & Related papers (2021-11-30T14:21:47Z)
Differentiable Annealed Importance Sampling and the Perils of Gradient Noise [68.44523807580438]
Annealed importance sampling (AIS) and related algorithms are highly effective tools for marginal likelihood estimation. Differentiability is a desirable property as it would admit the possibility of optimizing marginal likelihood as an objective. We propose a differentiable algorithm by abandoning Metropolis-Hastings steps, which further unlocks mini-batch computation.
arXiv Detail & Related papers (2021-07-21T17:10:14Z)
High-probability Bounds for Non-Convex Stochastic Optimization with Heavy Tails [55.561406656549686]
We consider non- Hilbert optimization using first-order algorithms for which the gradient estimates may have tails. We show that a combination of gradient, momentum, and normalized gradient descent convergence to critical points in high-probability with best-known iteration for smooth losses.
arXiv Detail & Related papers (2021-06-28T00:17:01Z)

This list is automatically generated from the titles and abstracts of the papers in this site.