Generalization Bounds of Surrogate Policies for Combinatorial Optimization Problems
- URL: http://arxiv.org/abs/2407.17200v2
- Date: Sat, 11 Oct 2025 14:56:12 GMT
- Title: Generalization Bounds of Surrogate Policies for Combinatorial Optimization Problems
- Authors: Pierre-Cyril Aubin-Frankowski, Yohann De Castro, Axel Parmentier, Alessandro Rudi,
- Abstract summary: We analyze smoothed (perturbed) policies, adding controlled random perturbations to the direction used by the linear oracle.<n>Our main contribution is a generalization bound that decomposes the excess risk into perturbation bias, statistical estimation error, and optimization error.<n>We illustrate the scope of the results on applications such as vehicle scheduling, highlighting how smoothing enables both tractable training and controlled generalization.
- Score: 53.03951222945921
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: A recent line of structured learning methods has advanced the practical state-of-the-art for combinatorial optimization problems with complex, application-specific objectives. These approaches learn policies that couple a statistical model with a tractable surrogate combinatorial optimization oracle, so as to exploit the distribution of problem instances instead of solving each instance independently. A core obstacle is that the empirical risk is then piecewise constant in the model parameters. This hinders gradient-based optimization and only few theoretical guarantees have been provided so far. We address this issue by analyzing smoothed (perturbed) policies: adding controlled random perturbations to the direction used by the linear oracle yields a differentiable surrogate risk and improves generalization. Our main contribution is a generalization bound that decomposes the excess risk into perturbation bias, statistical estimation error, and optimization error. The analysis hinges on a new Uniform Weak (UW) property capturing the geometric interaction between the statistical model and the normal fan of the feasible polytope; we show it holds under mild assumptions, and automatically when a minimal baseline perturbation is present. The framework covers, in particular, contextual stochastic optimization. We illustrate the scope of the results on applications such as stochastic vehicle scheduling, highlighting how smoothing enables both tractable training and controlled generalization.
Related papers
- Online Inference of Constrained Optimization: Primal-Dual Optimality and Sequential Quadratic Programming [55.848340925419286]
We study online statistical inference for the solutions of quadratic optimization problems with equality and inequality constraints.<n>We develop a sequential programming (SSQP) method to solve these problems, where the step direction is computed by sequentially performing an approximation of the objective and a linear approximation of the constraints.<n>We show that our method global almost moving-average convergence and exhibits local normality with an optimal primal-dual limiting matrix in the sense of Hjek and Le Cam.
arXiv Detail & Related papers (2025-11-27T06:16:17Z) - Preference Optimization for Combinatorial Optimization Problems [54.87466279363487]
Reinforcement Learning (RL) has emerged as a powerful tool for neural optimization, enabling models learns that solve complex problems without requiring expert knowledge.<n>Despite significant progress, existing RL approaches face challenges such as diminishing reward signals and inefficient exploration in vast action spaces.<n>We propose Preference Optimization, a novel method that transforms quantitative reward signals into qualitative preference signals via statistical comparison modeling.
arXiv Detail & Related papers (2025-05-13T16:47:00Z) - Primal-dual algorithm for contextual stochastic combinatorial optimization [1.4999444543328293]
This paper introduces a novel approach to contextual optimization, integrating operations research and machine learning to address decision-making under uncertainty.<n>Our goal is to minimize the empirical risk, which is estimated from past data on uncertain parameters and contexts.
arXiv Detail & Related papers (2025-05-07T19:37:12Z) - A Graphical Global Optimization Framework for Parameter Estimation of Statistical Models with Nonconvex Regularization Functions [0.0]
Problems with linear norm-bound constraints arise in a variety of applications, including portfolio optimization, machine learning, feature selection.<n>We propose a novel graphbased method to globally solve these problems.
arXiv Detail & Related papers (2025-05-06T18:09:54Z) - Kullback-Leibler excess risk bounds for exponential weighted aggregation in Generalized linear models [0.0]
This paper investigates the problem of sparse aggregation in generalized linear models (GLMs)<n>We prove that an exponential weighted aggregation scheme yields a sharp inequality oracle for the Kullback-Leibler risk with leading constant equal to one, while also attaining the minimax-optimal rate of aggregation.
arXiv Detail & Related papers (2025-04-14T12:25:11Z) - Semiparametric Counterfactual Regression [2.356908851188234]
We propose a doubly robust-style estimator for counterfactual regression within a generalizable framework.
Our approach uses incremental interventions to enhance adaptability while maintaining with standard methods.
Our analysis shows that the proposed estimators can achieve $sqrn$-consistency and normality for a broad class of problems.
arXiv Detail & Related papers (2025-04-03T15:32:26Z) - Representation-based Reward Modeling for Efficient Safety Alignment of Large Language Model [84.00480999255628]
Reinforcement Learning algorithms for safety alignment of Large Language Models (LLMs) encounter the challenge of distribution shift.
Current approaches typically address this issue through online sampling from the target policy.
We propose a new framework that leverages the model's intrinsic safety judgment capability to extract reward signals.
arXiv Detail & Related papers (2025-03-13T06:40:34Z) - Optimal Baseline Corrections for Off-Policy Contextual Bandits [61.740094604552475]
We aim to learn decision policies that optimize an unbiased offline estimate of an online reward metric.
We propose a single framework built on their equivalence in learning scenarios.
Our framework enables us to characterize the variance-optimal unbiased estimator and provide a closed-form solution for it.
arXiv Detail & Related papers (2024-05-09T12:52:22Z) - Bayesian Nonparametrics Meets Data-Driven Distributionally Robust Optimization [29.24821214671497]
Training machine learning and statistical models often involve optimizing a data-driven risk criterion.
We propose a novel robust criterion by combining insights from Bayesian nonparametric (i.e., Dirichlet process) theory and a recent decision-theoretic model of smooth ambiguity-averse preferences.
For practical implementation, we propose and study tractable approximations of the criterion based on well-known Dirichlet process representations.
arXiv Detail & Related papers (2024-01-28T21:19:15Z) - Likelihood Ratio Confidence Sets for Sequential Decision Making [51.66638486226482]
We revisit the likelihood-based inference principle and propose to use likelihood ratios to construct valid confidence sequences.
Our method is especially suitable for problems with well-specified likelihoods.
We show how to provably choose the best sequence of estimators and shed light on connections to online convex optimization.
arXiv Detail & Related papers (2023-11-08T00:10:21Z) - Sample-Efficient Multi-Agent RL: An Optimization Perspective [103.35353196535544]
We study multi-agent reinforcement learning (MARL) for the general-sum Markov Games (MGs) under the general function approximation.
We introduce a novel complexity measure called the Multi-Agent Decoupling Coefficient (MADC) for general-sum MGs.
We show that our algorithm provides comparable sublinear regret to the existing works.
arXiv Detail & Related papers (2023-10-10T01:39:04Z) - Domain Generalization without Excess Empirical Risk [83.26052467843725]
A common approach is designing a data-driven surrogate penalty to capture generalization and minimize the empirical risk jointly with the penalty.
We argue that a significant failure mode of this recipe is an excess risk due to an erroneous penalty or hardness in joint optimization.
We present an approach that eliminates this problem. Instead of jointly minimizing empirical risk with the penalty, we minimize the penalty under the constraint of optimality of the empirical risk.
arXiv Detail & Related papers (2023-08-30T08:46:46Z) - PARL: A Unified Framework for Policy Alignment in Reinforcement Learning from Human Feedback [106.63518036538163]
We present a novel unified bilevel optimization-based framework, textsfPARL, formulated to address the recently highlighted critical issue of policy alignment in reinforcement learning.
Our framework addressed these concerns by explicitly parameterizing the distribution of the upper alignment objective (reward design) by the lower optimal variable.
Our empirical results substantiate that the proposed textsfPARL can address the alignment concerns in RL by showing significant improvements.
arXiv Detail & Related papers (2023-08-03T18:03:44Z) - Robust Data-driven Prescriptiveness Optimization [4.792851066169871]
This paper introduces a distributionally robust contextual optimization model where the coefficient of prescriptiveness substitutes for the classical empirical risk objective minimization.
We evaluate the robustness of the resulting policies against alternative methods when the out-of-sample dataset is subject to varying amounts of distribution shift.
arXiv Detail & Related papers (2023-06-09T14:56:06Z) - Exploring the Algorithm-Dependent Generalization of AUPRC Optimization
with List Stability [107.65337427333064]
optimization of the Area Under the Precision-Recall Curve (AUPRC) is a crucial problem for machine learning.
In this work, we present the first trial in the single-dependent generalization of AUPRC optimization.
Experiments on three image retrieval datasets on speak to the effectiveness and soundness of our framework.
arXiv Detail & Related papers (2022-09-27T09:06:37Z) - Off-Policy Evaluation with Policy-Dependent Optimization Response [90.28758112893054]
We develop a new framework for off-policy evaluation with a textitpolicy-dependent linear optimization response.
We construct unbiased estimators for the policy-dependent estimand by a perturbation method.
We provide a general algorithm for optimizing causal interventions.
arXiv Detail & Related papers (2022-02-25T20:25:37Z) - Integrated Conditional Estimation-Optimization [6.037383467521294]
Many real-world optimization problems uncertain parameters with probability can be estimated using contextual feature information.
In contrast to the standard approach of estimating the distribution of uncertain parameters, we propose an integrated conditional estimation approach.
We show that our ICEO approach is theally consistent under moderate conditions.
arXiv Detail & Related papers (2021-10-24T04:49:35Z) - Outlier-Robust Sparse Estimation via Non-Convex Optimization [73.18654719887205]
We explore the connection between high-dimensional statistics and non-robust optimization in the presence of sparsity constraints.
We develop novel and simple optimization formulations for these problems.
As a corollary, we obtain that any first-order method that efficiently converges to station yields an efficient algorithm for these tasks.
arXiv Detail & Related papers (2021-09-23T17:38:24Z) - Runtime Analysis of Single- and Multi-Objective Evolutionary Algorithms for Chance Constrained Optimization Problems with Normally Distributed Random Variables [11.310502327308575]
We study the scenario of components that are independent and normally distributed.
We introduce a multi-objective formulation of the problem which trades off the expected cost and its variance.
We prove that this approach can also be used to compute a set of optimal solutions for the chance constrained minimum spanning tree problem.
arXiv Detail & Related papers (2021-09-13T09:24:23Z) - Towards Optimal Problem Dependent Generalization Error Bounds in
Statistical Learning Theory [11.840747467007963]
We study problem-dependent rates that scale near-optimally with the variance, the effective loss errors, or the norms evaluated at the "best gradient hypothesis"
We introduce a principled framework dubbed "uniform localized convergence"
We show that our framework resolves several fundamental limitations of existing uniform convergence and localization analysis approaches.
arXiv Detail & Related papers (2020-11-12T04:07:29Z) - Optimistic variants of single-objective bilevel optimization for
evolutionary algorithms [6.788217433800101]
A partial partial evolutionary approach has been proposed to solve the benchmark problems and have outstanding results.
A new variant has also been proposed to the commonly used convergence approaches, i.e. optimistic and pessimistic.
The experimental results demonstrate the algorithm converges differently to optimum solutions with the optimistic variants.
arXiv Detail & Related papers (2020-08-22T23:12:07Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.