Related papers: Learning MDPs from Features: Predict-Then-Optimize for Sequential Decision Problems by Reinforcement Learning

Learning MDPs from Features: Predict-Then-Optimize for Sequential Decision Problems by Reinforcement Learning

URL: http://arxiv.org/abs/2106.03279v1
Date: Sun, 6 Jun 2021 23:53:31 GMT
Title: Learning MDPs from Features: Predict-Then-Optimize for Sequential Decision Problems by Reinforcement Learning
Authors: Kai Wang, Sanket Shat, Haipeng Chen, Andrew Perrault, Finale Doshi-Velez, Milind Tambe
Abstract summary: We study the predict-then-optimize framework in the context of sequential decision problems (formulated as MDPs) solved via reinforcement learning. Two significant computational challenges arise in applying decision-focused learning to MDPs.
Score: 52.74071439183113
License: http://creativecommons.org/licenses/by/4.0/
Abstract: In the predict-then-optimize framework, the objective is to train a predictive model, mapping from environment features to parameters of an optimization problem, which maximizes decision quality when the optimization is subsequently solved. Recent work on decision-focused learning shows that embedding the optimization problem in the training pipeline can improve decision quality and help generalize better to unseen tasks compared to relying on an intermediate loss function for evaluating prediction quality. We study the predict-then-optimize framework in the context of sequential decision problems (formulated as MDPs) that are solved via reinforcement learning. In particular, we are given environment features and a set of trajectories from training MDPs, which we use to train a predictive model that generalizes to unseen test MDPs without trajectories. Two significant computational challenges arise in applying decision-focused learning to MDPs: (i) large state and action spaces make it infeasible for existing techniques to differentiate through MDP problems, and (ii) the high-dimensional policy space, as parameterized by a neural network, makes differentiating through a policy expensive. We resolve the first challenge by sampling provably unbiased derivatives to approximate and differentiate through optimality conditions, and the second challenge by using a low-rank approximation to the high-dimensional sample-based derivatives. We implement both Bellman--based and policy gradient--based decision-focused learning on three different MDP problems with missing parameters, and show that decision-focused learning performs better in generalization to unseen tasks.

Related papers

Solver-Free Decision-Focused Learning for Linear Optimization Problems [6.305123652677644]
In many real-world scenarios, the parameters of the optimization problem are not known a priori and must be predicted from contextual features.<n>This gives rise to predict-then-optimize problems, where a machine learning model predicts problem parameters that are then used to make decisions via optimization.<n>We propose a solver-free training method that exploits the geometric structure of linear optimization to enable efficient training with minimal degradation in solution quality.
arXiv Detail & Related papers (2025-05-28T10:55:16Z)
Online Decision-Focused Learning [63.83903681295497]
Decision-focused learning (DFL) is an increasingly popular paradigm for training predictive models whose outputs are used in decision-making tasks.<n>We investigate DFL in dynamic environments where the objective function does not evolve over time.<n>We establish bounds on the expected dynamic regret, both when decision space is a simplex and when it is a general bounded convex polytope.
arXiv Detail & Related papers (2025-05-19T10:40:30Z)
Self-Supervised Penalty-Based Learning for Robust Constrained Optimization [4.297070083645049]
We propose a new methodology for parameterized constrained robust optimization, based on learning with a self-supervised penalty-based loss function. Our approach is able to effectively learn neural network approximations whose inference time is significantly smaller than the time of traditional solvers.
arXiv Detail & Related papers (2025-03-07T06:42:17Z)
A Stochastic Approach to Bi-Level Optimization for Hyperparameter Optimization and Meta Learning [74.80956524812714]
We tackle the general differentiable meta learning problem that is ubiquitous in modern deep learning. These problems are often formalized as Bi-Level optimizations (BLO) We introduce a novel perspective by turning a given BLO problem into a ii optimization, where the inner loss function becomes a smooth distribution, and the outer loss becomes an expected loss over the inner distribution.
arXiv Detail & Related papers (2024-10-14T12:10:06Z)
Learning Joint Models of Prediction and Optimization [56.04498536842065]
Predict-Then-Then framework uses machine learning models to predict unknown parameters of an optimization problem from features before solving. This paper proposes an alternative method, in which optimal solutions are learned directly from the observable features by joint predictive models.
arXiv Detail & Related papers (2024-09-07T19:52:14Z)
Differentiation of Multi-objective Data-driven Decision Pipeline [34.577809430781144]
Real-world scenarios frequently involve multi-objective data-driven optimization problems. Traditional two-stage methods apply a machine learning model to estimate problem coefficients, followed by invoking a solver to tackle the predicted optimization problem. Recent efforts have focused on end-to-end training of predictive models that use decision loss derived from the downstream optimization problem.
arXiv Detail & Related papers (2024-06-02T15:42:03Z)
Two-Stage ML-Guided Decision Rules for Sequential Decision Making under Uncertainty [55.06411438416805]
Sequential Decision Making under Uncertainty (SDMU) is ubiquitous in many domains such as energy, finance, and supply chains. Some SDMU are naturally modeled as Multistage Problems (MSPs) but the resulting optimizations are notoriously challenging from a computational standpoint. This paper introduces a novel approach Two-Stage General Decision Rules (TS-GDR) to generalize the policy space beyond linear functions. The effectiveness of TS-GDR is demonstrated through an instantiation using Deep Recurrent Neural Networks named Two-Stage Deep Decision Rules (TS-LDR)
arXiv Detail & Related papers (2024-05-23T18:19:47Z)
End-to-End Learning for Fair Multiobjective Optimization Under Uncertainty [55.04219793298687]
The Predict-Then-Forecast (PtO) paradigm in machine learning aims to maximize downstream decision quality. This paper extends the PtO methodology to optimization problems with nondifferentiable Ordered Weighted Averaging (OWA) objectives. It shows how optimization of OWA functions can be effectively integrated with parametric prediction for fair and robust optimization under uncertainty.
arXiv Detail & Related papers (2024-02-12T16:33:35Z)
Predict-Then-Optimize by Proxy: Learning Joint Models of Prediction and Optimization [59.386153202037086]
Predict-Then- framework uses machine learning models to predict unknown parameters of an optimization problem from features before solving. This approach can be inefficient and requires handcrafted, problem-specific rules for backpropagation through the optimization step. This paper proposes an alternative method, in which optimal solutions are learned directly from the observable features by predictive models.
arXiv Detail & Related papers (2023-11-22T01:32:06Z)
Optimal Sequential Decision-Making in Geosteering: A Reinforcement Learning Approach [0.0]
Trajectory adjustment decisions throughout the drilling process, called geosteering, affect subsequent choices and information gathering. We use the Deep Q-Network (DQN) method, a model-free reinforcement learning (RL) method that learns directly from the decision environment. For two previously published synthetic geosteering scenarios, our results show that RL achieves high-quality outcomes comparable to the quasi-optimal ADP.
arXiv Detail & Related papers (2023-10-07T10:49:30Z)
Automatically Learning Compact Quality-aware Surrogates for Optimization Problems [55.94450542785096]
Solving optimization problems with unknown parameters requires learning a predictive model to predict the values of the unknown parameters and then solving the problem using these values. Recent work has shown that including the optimization problem as a layer in a complex training model pipeline results in predictions of iteration of unobserved decision making. We show that we can improve solution quality by learning a low-dimensional surrogate model of a large optimization problem.
arXiv Detail & Related papers (2020-06-18T19:11:54Z)
Parameterized MDPs and Reinforcement Learning Problems -- A Maximum Entropy Principle Based Framework [2.741266294612776]
We present a framework to address a class of sequential decision making problems. Our framework features learning the optimal control policy with robustness to noisy data.
arXiv Detail & Related papers (2020-06-17T04:08:35Z)

This list is automatically generated from the titles and abstracts of the papers in this site.