Online Decision-Focused Learning
- URL: http://arxiv.org/abs/2505.13564v2
- Date: Fri, 03 Oct 2025 09:35:06 GMT
- Title: Online Decision-Focused Learning
- Authors: Aymeric Capitaine, Maxime Haddouche, Eric Moulines, Michael I. Jordan, Etienne Boursier, Alain Durmus,
- Abstract summary: Decision-focused learning (DFL) is an increasingly popular paradigm for training models whose predictive outputs are used in decision-making tasks.<n>In this paper, we regularize the objective function to make it different and investigate how to overcome nonoptimality function.<n>We also showcase the effectiveness of our algorithms on a knapsack experiment, where they outperform two standard benchmarks.
- Score: 74.3205104323777
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Decision-focused learning (DFL) is an increasingly popular paradigm for training predictive models whose outputs are used in decision-making tasks. Instead of merely optimizing for predictive accuracy, DFL trains models to directly minimize the loss associated with downstream decisions. However, existing studies focus solely on scenarios where a fixed batch of data is available and the objective function does not change over time. We instead investigate DFL in dynamic environments where the objective function and data distribution evolve over time. This setting is challenging for online learning because the objective function has zero or undefined gradients -- which prevents the use of standard first-order optimization methods -- and is generally non-convex. To address these difficulties, we (i) regularize the objective to make it differentiable and (ii) use perturbation techniques along with a near-optimal oracle to overcome non-convexity. Combining those techniques yields two original online algorithms tailored for DFL, for which we establish respectively static and dynamic regret bounds. These are the first provable guarantees for the online decision-focused problem. Finally, we showcase the effectiveness of our algorithms on a knapsack experiment, where they outperform two standard benchmarks.
Related papers
- A Dual Perspective on Decision-Focused Learning: Scalable Training via Dual-Guided Surrogates [1.7100385719232911]
Decision-focused learning trains models with awareness of how predictions refreshes, improving the performance of downstream decisions.<n>Despite its promise, scaling is challenging: state-of-the-art methods either differentiate through a solver or rely on task-specific surrogates.<n>In this paper, we leverage dual variables to shape learning and introduce Dual-Guided Loss (DGL)<n>DGL matches or exceeds state-of-the-art DFL methods while using far fewer calls and substantially less training time.
arXiv Detail & Related papers (2025-11-07T01:15:15Z) - Prediction Loss Guided Decision-Focused Learning [33.28196791099554]
Decision-focused learning (DFL) trains a predictive model by directly optimizing the decision quality in an end-to-end manner.<n>PFL yields more stable optimization, but overlooks the downstream decision quality.<n>We propose a simple yet effective approach: perturbing the decision loss gradient using the prediction loss gradient to construct an update direction.
arXiv Detail & Related papers (2025-09-10T07:49:04Z) - Test-time Offline Reinforcement Learning on Goal-related Experience [50.94457794664909]
Research in foundation models has shown that performance can be substantially improved through test-time training.<n>We propose a novel self-supervised data selection criterion, which selects transitions from an offline dataset according to their relevance to the current state.<n>Our goal-conditioned test-time training (GC-TTT) algorithm applies this routine in a receding-horizon fashion during evaluation, adapting the policy to the current trajectory as it is being rolled out.
arXiv Detail & Related papers (2025-07-24T21:11:39Z) - Solver-Free Decision-Focused Learning for Linear Optimization Problems [6.305123652677644]
In many real-world scenarios, the parameters of the optimization problem are not known a priori and must be predicted from contextual features.<n>This gives rise to predict-then-optimize problems, where a machine learning model predicts problem parameters that are then used to make decisions via optimization.<n>We propose a solver-free training method that exploits the geometric structure of linear optimization to enable efficient training with minimal degradation in solution quality.
arXiv Detail & Related papers (2025-05-28T10:55:16Z) - Online Learning and Unlearning [56.770023668379615]
We present two online learner-unlearner (OLU) algorithms, both built upon online gradient descent (OGD)<n>The first, passive OLU, leverages OGD's contractive property and injects noise when unlearning occurs, incurring no additional computation.<n>The second, active OLU, uses an offline unlearning algorithm that shifts the model toward a solution excluding the deleted data.
arXiv Detail & Related papers (2025-05-13T13:33:36Z) - OPO: Making Decision-Focused Data Acquisition Decisions [0.0]
We propose a model for making data acquisition decisions for variables in contextual optimisation problems.<n>We solve the data acquisition problem with well-defined constraints by learning a surrogate linear objective function.<n>We ablate the problem with a number of training modalities and demonstrate that the differentiable optimisation approach outperforms random search strategies.
arXiv Detail & Related papers (2025-04-21T12:41:35Z) - Self-Supervised Penalty-Based Learning for Robust Constrained Optimization [4.297070083645049]
We propose a new methodology for parameterized constrained robust optimization, based on learning with a self-supervised penalty-based loss function.<n>Our approach is able to effectively learn neural network approximations whose inference time is significantly smaller than the time of traditional solvers.
arXiv Detail & Related papers (2025-03-07T06:42:17Z) - Discovering Preference Optimization Algorithms with and for Large Language Models [50.843710797024805]
offline preference optimization is a key method for enhancing and controlling the quality of Large Language Model (LLM) outputs.
We perform objective discovery to automatically discover new state-of-the-art preference optimization algorithms without (expert) human intervention.
Experiments demonstrate the state-of-the-art performance of DiscoPOP, a novel algorithm that adaptively blends logistic and exponential losses.
arXiv Detail & Related papers (2024-06-12T16:58:41Z) - Adaptive Retention & Correction: Test-Time Training for Continual Learning [114.5656325514408]
A common problem in continual learning is the classification layer's bias towards the most recent task.<n>We name our approach Adaptive Retention & Correction (ARC)<n>ARC achieves an average performance increase of 2.7% and 2.6% on the CIFAR-100 and Imagenet-R datasets.
arXiv Detail & Related papers (2024-05-23T08:43:09Z) - Learning Constrained Optimization with Deep Augmented Lagrangian Methods [54.22290715244502]
A machine learning (ML) model is trained to emulate a constrained optimization solver.
This paper proposes an alternative approach, in which the ML model is trained to predict dual solution estimates directly.
It enables an end-to-end training scheme is which the dual objective is as a loss function, and solution estimates toward primal feasibility, emulating a Dual Ascent method.
arXiv Detail & Related papers (2024-03-06T04:43:22Z) - From Function to Distribution Modeling: A PAC-Generative Approach to
Offline Optimization [30.689032197123755]
This paper considers the problem of offline optimization, where the objective function is unknown except for a collection of offline" data examples.
Instead of learning and then optimizing the unknown objective function, we take on a less intuitive but more direct view that optimization can be thought of as a process of sampling from a generative model.
arXiv Detail & Related papers (2024-01-04T01:32:50Z) - On the Robustness of Decision-Focused Learning [0.0]
Decision-Focused Learning (DFL) is an emerging learning paradigm that tackles the task of training a machine learning (ML) model to predict missing parameters of an incomplete optimization problem, where the missing parameters are predicted.<n>DFL trains an ML model in an end-to-end system, by integrating the prediction and optimization tasks, providing better alignment of the training and testing objectives.
arXiv Detail & Related papers (2023-11-28T04:34:04Z) - Predict-Then-Optimize by Proxy: Learning Joint Models of Prediction and
Optimization [59.386153202037086]
Predict-Then- framework uses machine learning models to predict unknown parameters of an optimization problem from features before solving.
This approach can be inefficient and requires handcrafted, problem-specific rules for backpropagation through the optimization step.
This paper proposes an alternative method, in which optimal solutions are learned directly from the observable features by predictive models.
arXiv Detail & Related papers (2023-11-22T01:32:06Z) - Score Function Gradient Estimation to Widen the Applicability of Decision-Focused Learning [17.962860438133312]
Decision-focused learning (DFL) paradigm overcomes limitation by training to directly minimize a task loss, e.g. regret.
We propose an alternative method that makes no such assumptions, it combines smoothing with score function estimation which works on any task loss.
Experiments show that it typically requires more epochs, but that it is on par with specialized methods and performs especially well for the difficult case of problems with uncertainty in the constraints, in terms of solution quality, scalability, or both.
arXiv Detail & Related papers (2023-07-11T12:32:13Z) - Data-Driven Offline Decision-Making via Invariant Representation
Learning [97.49309949598505]
offline data-driven decision-making involves synthesizing optimized decisions with no active interaction.
A key challenge is distributional shift: when we optimize with respect to the input into a model trained from offline data, it is easy to produce an out-of-distribution (OOD) input that appears erroneously good.
In this paper, we formulate offline data-driven decision-making as domain adaptation, where the goal is to make accurate predictions for the value of optimized decisions.
arXiv Detail & Related papers (2022-11-21T11:01:37Z) - Introduction to Online Control [34.77535508151501]
In online nonstochastic control, both the cost functions as well as the perturbations from the assumed dynamical model are chosen by an adversary.<n>The target is to attain low regret against the best policy in hindsight from a benchmark class of policies.
arXiv Detail & Related papers (2022-11-17T16:12:45Z) - Learning MDPs from Features: Predict-Then-Optimize for Sequential
Decision Problems by Reinforcement Learning [52.74071439183113]
We study the predict-then-optimize framework in the context of sequential decision problems (formulated as MDPs) solved via reinforcement learning.
Two significant computational challenges arise in applying decision-focused learning to MDPs.
arXiv Detail & Related papers (2021-06-06T23:53:31Z) - Optimizing Wireless Systems Using Unsupervised and
Reinforced-Unsupervised Deep Learning [96.01176486957226]
Resource allocation and transceivers in wireless networks are usually designed by solving optimization problems.
In this article, we introduce unsupervised and reinforced-unsupervised learning frameworks for solving both variable and functional optimization problems.
arXiv Detail & Related papers (2020-01-03T11:01:52Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.