Contextual Inverse Optimization: Offline and Online Learning
- URL: http://arxiv.org/abs/2106.14015v3
- Date: Sat, 1 Jul 2023 20:23:26 GMT
- Title: Contextual Inverse Optimization: Offline and Online Learning
- Authors: Omar Besbes, Yuri Fonseca, Ilan Lobel
- Abstract summary: We study the problems of offline and online contextual optimization with feedback information.
We aim to minimize regret, which is defined as the difference between our losses and the ones incurred by an all-knowing oracle.
- Score: 3.6739949215165164
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We study the problems of offline and online contextual optimization with
feedback information, where instead of observing the loss, we observe,
after-the-fact, the optimal action an oracle with full knowledge of the
objective function would have taken. We aim to minimize regret, which is
defined as the difference between our losses and the ones incurred by an
all-knowing oracle. In the offline setting, the decision-maker has information
available from past periods and needs to make one decision, while in the online
setting, the decision-maker optimizes decisions dynamically over time based a
new set of feasible actions and contextual functions in each period. For the
offline setting, we characterize the optimal minimax policy, establishing the
performance that can be achieved as a function of the underlying geometry of
the information induced by the data. In the online setting, we leverage this
geometric characterization to optimize the cumulative regret. We develop an
algorithm that yields the first regret bound for this problem that is
logarithmic in the time horizon. Finally, we show via simulation that our
proposed algorithms outperform previous methods from the literature.
Related papers
- Learning to Cover: Online Learning and Optimization with Irreversible Decisions [50.5775508521174]
We find that regret grows sub-linearly at a rate $Thetaleft(mfrac12cdotfrac11-2-Tright)$, thus converging exponentially fast to $Theta(sqrtm)$.
These findings underscore the benefits of limited online learning and optimization, in that even a few rounds can provide significant benefits as compared to a no-learning baseline.
arXiv Detail & Related papers (2024-06-20T23:00:25Z) - Discovering Preference Optimization Algorithms with and for Large Language Models [50.843710797024805]
offline preference optimization is a key method for enhancing and controlling the quality of Large Language Model (LLM) outputs.
We perform objective discovery to automatically discover new state-of-the-art preference optimization algorithms without (expert) human intervention.
Experiments demonstrate the state-of-the-art performance of DiscoPOP, a novel algorithm that adaptively blends logistic and exponential losses.
arXiv Detail & Related papers (2024-06-12T16:58:41Z) - Learning Goal-Conditioned Policies from Sub-Optimal Offline Data via Metric Learning [22.174803826742963]
We address the problem of learning optimal behavior from sub-optimal datasets for goal-conditioned offline reinforcement learning.
We propose the use of metric learning to approximate the optimal value function for goal-conditioned offline RL problems.
We show that our method estimates optimal behaviors from severely sub-optimal offline datasets without suffering from out-of-distribution estimation errors.
arXiv Detail & Related papers (2024-02-16T16:46:53Z) - End-to-End Learning for Fair Multiobjective Optimization Under
Uncertainty [55.04219793298687]
The Predict-Then-Forecast (PtO) paradigm in machine learning aims to maximize downstream decision quality.
This paper extends the PtO methodology to optimization problems with nondifferentiable Ordered Weighted Averaging (OWA) objectives.
It shows how optimization of OWA functions can be effectively integrated with parametric prediction for fair and robust optimization under uncertainty.
arXiv Detail & Related papers (2024-02-12T16:33:35Z) - Non-Convex Bilevel Optimization with Time-Varying Objective Functions [57.299128109226025]
We propose an online bilevel optimization where the functions can be time-varying and the agent continuously updates the decisions with online data.
Compared to existing algorithms, SOBOW is computationally efficient and does not need to know previous functions.
We show that SOBOW can achieve a sublinear bilevel local regret under mild conditions.
arXiv Detail & Related papers (2023-08-07T06:27:57Z) - Online Joint Assortment-Inventory Optimization under MNL Choices [14.530542487845732]
We study an online joint assortment-inventory optimization problem, in which we assume that the choice behavior of each customer follows the Multinomial Logit (MNL) choice model.
We propose a novel algorithm that can effectively balance the exploration and exploitation in the online decision-making of assortment and inventory.
arXiv Detail & Related papers (2023-04-04T09:25:34Z) - Efficient Online Learning with Memory via Frank-Wolfe Optimization:
Algorithms with Bounded Dynamic Regret and Applications to Control [15.588080817106563]
We introduce the first projection-free meta-base learning algorithm with memory that minimizes dynamic regret.
We are motivated by artificial intelligence applications where autonomous agents need to adapt to time-varying environments.
arXiv Detail & Related papers (2023-01-02T01:12:29Z) - Data-Driven Offline Decision-Making via Invariant Representation
Learning [97.49309949598505]
offline data-driven decision-making involves synthesizing optimized decisions with no active interaction.
A key challenge is distributional shift: when we optimize with respect to the input into a model trained from offline data, it is easy to produce an out-of-distribution (OOD) input that appears erroneously good.
In this paper, we formulate offline data-driven decision-making as domain adaptation, where the goal is to make accurate predictions for the value of optimized decisions.
arXiv Detail & Related papers (2022-11-21T11:01:37Z) - Learning to Control under Time-Varying Environment [18.48729114775298]
This paper investigates the problem of regret in linear time-varying (LTV) dynamical systems.
We propose the first computationally tractable online algorithm with regret guarantees.
arXiv Detail & Related papers (2022-06-06T11:40:46Z) - OptiDICE: Offline Policy Optimization via Stationary Distribution
Correction Estimation [59.469401906712555]
We present an offline reinforcement learning algorithm that prevents overestimation in a more principled way.
Our algorithm, OptiDICE, directly estimates the stationary distribution corrections of the optimal policy.
We show that OptiDICE performs competitively with the state-of-the-art methods.
arXiv Detail & Related papers (2021-06-21T00:43:30Z) - Online Convex Optimization Perspective for Learning from Dynamically
Revealed Preferences [0.0]
We study the problem of online learning from revealed preferences.
A learner wishes to learn a non-strategic agent's private utility function through observing the agent's utility-maximizing actions in a changing environment.
We adopt an online inverse optimization setup, where the learner observes a stream of agent's actions in an online fashion and the learning performance is measured by regret associated with a loss function.
arXiv Detail & Related papers (2020-08-24T14:05:13Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.