Related papers: Online Learning via Offline Greedy Algorithms: Applications in Market Design and Optimization

Online Learning via Offline Greedy Algorithms: Applications in Market Design and Optimization

URL: http://arxiv.org/abs/2102.11050v1
Date: Thu, 18 Feb 2021 19:05:26 GMT
Title: Online Learning via Offline Greedy Algorithms: Applications in Market Design and Optimization
Authors: Rad Niazadeh (1), Negin Golrezaei (2), Joshua Wang (3), Fransisca Susan (2), Ashwinkumar Badanidiyuru (3) ((1) Chicago Booth School of Business, Operations Management, (2) MIT Sloan School of Management, Operations Management, (3) Google Research Mountain View)
Abstract summary: We focus on offline problems that are amenable to a constant factor approximation using a greedy algorithm that is robust to local errors. For such problems, we provide a general framework that efficiently transforms offline robust greedy algorithms to online ones. We show that the resulting online algorithms have $O(sqrtT)$ (approximate) regret under the full information setting.
Score: 0.0
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Motivated by online decision-making in time-varying combinatorial environments, we study the problem of transforming offline algorithms to their online counterparts. We focus on offline combinatorial problems that are amenable to a constant factor approximation using a greedy algorithm that is robust to local errors. For such problems, we provide a general framework that efficiently transforms offline robust greedy algorithms to online ones using Blackwell approachability. We show that the resulting online algorithms have $O(\sqrt{T})$ (approximate) regret under the full information setting. We further introduce a bandit extension of Blackwell approachability that we call Bandit Blackwell approachability. We leverage this notion to transform greedy robust offline algorithms into a $O(T^{2/3})$ (approximate) regret in the bandit setting. Demonstrating the flexibility of our framework, we apply our offline-to-online transformation to several problems at the intersection of revenue management, market design, and online optimization, including product ranking optimization in online platforms, reserve price optimization in auctions, and submodular maximization. We show that our transformation, when applied to these applications, leads to new regret bounds or improves the current known bounds.

Related papers

Bi-Criteria Optimization for Combinatorial Bandits: Sublinear Regret and Constraint Violation under Bandit Feedback [27.613888121859393]
We study bi-criteria optimization for multi-armed bandits (CMAB) with bandit feedback. We propose a general framework that transforms discrete bi-linear offline approximation algorithms into online algorithms with sublinear regret and cumulative constraint violation guarantees. These applications highlight the framework's broad utility in adapting offline guarantees to online bi-criteria optimization under bandit feedback.
arXiv Detail & Related papers (2025-03-15T22:52:27Z)
Online Conversion with Switching Costs: Robust and Learning-Augmented Algorithms [11.029788598491077]
We study online conversion with switching costs, a family of online problems that capture emerging problems at the intersection of energy and sustainability. We introduce competitive (robust) threshold-based algorithms for both the deterministic and deterministic variants of this problem. We then propose learning-augmented algorithms that take advantage of black-box advice to achieve significantly better average-case performance.
arXiv Detail & Related papers (2023-10-31T16:34:49Z)
Efficient Methods for Non-stationary Online Learning [67.3300478545554]
We present efficient methods for optimizing dynamic regret and adaptive regret, which reduce the number of projections per round from $mathcalO(log T)$ to $1$. Our technique hinges on the reduction mechanism developed in parameter-free online learning and requires non-trivial twists on non-stationary online methods.
arXiv Detail & Related papers (2023-09-16T07:30:12Z)
A Batch-to-Online Transformation under Random-Order Model [23.817140289575377]
We introduce a transformation framework that can be utilized to develop online algorithms with low $epsilon$-approximate regret. We apply it to various problems, including online $(k,z)$-clustering, online matrix approximation, and online regression. Our algorithm also enjoys low inconsistency, which may be desired in some online applications.
arXiv Detail & Related papers (2023-06-12T14:50:21Z)
Online and Streaming Algorithms for Constrained $k$-Submodular Maximization [25.15934533974176]
Constrained $k$-submodular is a general framework that captures many discrete optimization problems such as ad allocation, influence, personalized recommendation, and many others. In this work, we develop single-pass streaming and online algorithms for $k$-submodular constrained with both monotone and general (possibly non-monotone) objectives.
arXiv Detail & Related papers (2023-05-25T12:53:17Z)
Learning-Assisted Algorithm Unrolling for Online Optimization with Budget Constraints [27.84415856657607]
We propose a new machine learning (ML) assisted unrolling approach, called LAAU (Learning-Assisted Algorithm Unrolling) For efficient training via backpropagation, we derive gradients of the decision pipeline over time. We also provide the average cost bounds for two cases when training data is available offline and collected online, respectively.
arXiv Detail & Related papers (2022-12-03T20:56:29Z)
Online Nonsubmodular Minimization with Delayed Costs: From Full Information to Bandit Feedback [98.7678704343537]
We focus on a class of nonsubmodular functions with special structure, and prove regret guarantees for several variants of the online and approximate online bandit gradient descent algorithms. We derive bounds for the agent's regret in the full information and bandit feedback setting, even if the delay between choosing a decision and receiving the incurred cost is unbounded.
arXiv Detail & Related papers (2022-05-15T08:27:12Z)
Optimal Parameter-free Online Learning with Switching Cost [47.415099037249085]
-freeness in online learning refers to the adaptivity of an algorithm with respect to the optimal decision in hindsight. In this paper, we design such algorithms in the presence of switching cost - the latter penalizes the optimistic updates required by parameter-freeness. We propose a simple yet powerful algorithm for Online Linear Optimization (OLO) with switching cost, which improves the existing suboptimal regret bound [ZCP22a] to the optimal rate.
arXiv Detail & Related papers (2022-05-13T18:44:27Z)
Safe Online Bid Optimization with Return-On-Investment and Budget Constraints subject to Uncertainty [87.81197574939355]
We study the nature of both the optimization and learning problems. We provide an algorithm, namely GCB, guaranteeing sublinear regret at the cost of a potentially linear number of constraints violations. More interestingly, we provide an algorithm, namely GCB_safe(psi,phi), guaranteeing both sublinear pseudo-regret and safety w.h.p. at the cost of accepting tolerances psi and phi.
arXiv Detail & Related papers (2022-01-18T17:24:20Z)
Adaptivity and Non-stationarity: Problem-dependent Dynamic Regret for Online Convex Optimization [70.4342220499858]
We introduce novel online algorithms that can exploit smoothness and replace the dependence on $T$ in dynamic regret with problem-dependent quantities. Our results are adaptive to the intrinsic difficulty of the problem, since the bounds are tighter than existing results for easy problems and safeguard the same rate in the worst case.
arXiv Detail & Related papers (2021-12-29T02:42:59Z)
Boosting for Online Convex Optimization [64.15578413206715]
We consider the decision-making framework of online convex optimization with a large number of experts. We define a weak learning algorithm as a mechanism that guarantees approximate regret against a base class of experts. We give an efficient boosting algorithm that guarantees near-optimal regret against the convex hull of the base class.
arXiv Detail & Related papers (2021-02-18T12:30:49Z)
Online Markov Decision Processes with Aggregate Bandit Feedback [74.85532145498742]
We study a novel variant of online finite-horizon Markov Decision Processes with adversarially changing loss functions and initially unknown dynamics. In each episode, the learner suffers the loss accumulated along the trajectory realized by the policy chosen for the episode, and observes aggregate bandit feedback. Our main result is a computationally efficient algorithm with $O(sqrtK)$ regret for this setting, where $K$ is the number of episodes.
arXiv Detail & Related papers (2021-01-31T16:49:07Z)
Online Optimization with Memory and Competitive Control [43.974996526016305]
This paper presents competitive algorithms for a class of online optimization problems with memory. We show a connection between online optimization with memory and online control with adversarial disturbances.
arXiv Detail & Related papers (2020-02-13T02:58:11Z)

This list is automatically generated from the titles and abstracts of the papers in this site.