Online Optimization with Feedback Delay and Nonlinear Switching Cost
- URL: http://arxiv.org/abs/2111.00095v1
- Date: Fri, 29 Oct 2021 21:55:01 GMT
- Title: Online Optimization with Feedback Delay and Nonlinear Switching Cost
- Authors: Weici Pan, Guanya Shi, Yiheng Lin, Adam Wierman
- Abstract summary: We study a variant of online optimization in which the learner receives $k$-round $textitdelayed feedback$ about hitting cost.
We show that a novel Iterative Regularized Online Balanced Descent (iROBD) algorithm has a constant, dimension-free competitive ratio that is $O(L2k)$.
- Score: 16.975704972827305
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We study a variant of online optimization in which the learner receives
$k$-round $\textit{delayed feedback}$ about hitting cost and there is a
multi-step nonlinear switching cost, i.e., costs depend on multiple previous
actions in a nonlinear manner. Our main result shows that a novel Iterative
Regularized Online Balanced Descent (iROBD) algorithm has a constant,
dimension-free competitive ratio that is $O(L^{2k})$, where $L$ is the
Lipschitz constant of the switching cost. Additionally, we provide lower bounds
that illustrate the Lipschitz condition is required and the dependencies on $k$
and $L$ are tight. Finally, via reductions, we show that this setting is
closely related to online control problems with delay, nonlinear dynamics, and
adversarial disturbances, where iROBD directly offers constant-competitive
online policies.
Related papers
- Order-Optimal Instance-Dependent Bounds for Offline Reinforcement Learning with Preference Feedback [56.6950165117658]
We consider offline reinforcement learning with preference feedback in which the implicit reward is a linear function of an unknown parameter.
We propose an algorithm, underlineRL with underlineLocally underlineOptimal underlineWeights or sc RL-LOW, which yields a simple regret of $exp.
We observe that the lower and upper bounds on the simple regret match order-wise in the exponent, demonstrating order-wise optimality of sc RL-LOW.
arXiv Detail & Related papers (2024-06-18T02:03:12Z) - Regret Analysis of Policy Optimization over Submanifolds for Linearly
Constrained Online LQG [12.201535821920624]
We study online linear quadratic Gaussian problems with a given linear constraint imposed on the controller.
We propose online optimistic Newton on manifold (OONM) which provides an online controller based on the prediction on the first and second order information of the function sequence.
arXiv Detail & Related papers (2024-03-13T14:06:18Z) - Online Convex Optimization with Switching Cost and Delayed Gradients [7.903539618132858]
An online algorithm can choose its action using only gradient information about the previous objective function.
We show that the competitive ratio for the OCO problem with quadratic switching cost is at most $4(L + 5) + frac16(L + 5)mu$.
We conclude that the optimal competitive ratio for the quadratic and linear switching costs are fundamentally different in the limited information setting.
arXiv Detail & Related papers (2023-10-18T11:06:06Z) - Rate-Optimal Online Convex Optimization in Adaptive Linear Control [0.0]
We consider the problem of controlling an unknown convex linear system under adversarially changing costs.
We present the first computationally-gret that attains an optimal linear hindsight function.
arXiv Detail & Related papers (2022-06-03T07:32:11Z) - Online Nonsubmodular Minimization with Delayed Costs: From Full
Information to Bandit Feedback [98.7678704343537]
We focus on a class of nonsubmodular functions with special structure, and prove regret guarantees for several variants of the online and approximate online bandit gradient descent algorithms.
We derive bounds for the agent's regret in the full information and bandit feedback setting, even if the delay between choosing a decision and receiving the incurred cost is unbounded.
arXiv Detail & Related papers (2022-05-15T08:27:12Z) - Smoothed Online Convex Optimization Based on Discounted-Normal-Predictor [68.17855675511602]
We investigate an online prediction strategy named as Discounted-Normal-Predictor (Kapralov and Panigrahy, 2010) for smoothed online convex optimization (SOCO)
We show that the proposed algorithm can minimize the adaptive regret with switching cost in every interval.
arXiv Detail & Related papers (2022-05-02T08:48:22Z) - Robust Online Control with Model Misspecification [96.23493624553998]
We study online control of an unknown nonlinear dynamical system with model misspecification.
Our study focuses on robustness, which measures how much deviation from the assumed linear approximation can be tolerated.
arXiv Detail & Related papers (2021-07-16T07:04:35Z) - Revisiting Smoothed Online Learning [70.09792747315323]
We investigate the problem of smoothed online learning, in which the online learner suffers both a hitting cost and a switching cost.
To bound the competitive ratio, we assume the hitting cost is known to the learner in each round, and investigate the greedy algorithm which simply minimizes the weighted sum of the hitting cost and the switching cost.
arXiv Detail & Related papers (2021-02-13T14:15:55Z) - Online Markov Decision Processes with Aggregate Bandit Feedback [74.85532145498742]
We study a novel variant of online finite-horizon Markov Decision Processes with adversarially changing loss functions and initially unknown dynamics.
In each episode, the learner suffers the loss accumulated along the trajectory realized by the policy chosen for the episode, and observes aggregate bandit feedback.
Our main result is a computationally efficient algorithm with $O(sqrtK)$ regret for this setting, where $K$ is the number of episodes.
arXiv Detail & Related papers (2021-01-31T16:49:07Z) - Online Optimization with Memory and Competitive Control [43.974996526016305]
This paper presents competitive algorithms for a class of online optimization problems with memory.
We show a connection between online optimization with memory and online control with adversarial disturbances.
arXiv Detail & Related papers (2020-02-13T02:58:11Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.