Online Sequential Decision-Making with Unknown Delays
- URL: http://arxiv.org/abs/2402.07703v3
- Date: Fri, 23 Feb 2024 06:05:19 GMT
- Title: Online Sequential Decision-Making with Unknown Delays
- Authors: Ping Wu and Heyan Huang and Zhengyang Liu
- Abstract summary: We propose three families of delayed algorithms based on approximate solutions to handle different types of received feedback.
For each type of algorithm, we provide corresponding regret bounds under cases of general convexity and relative strong convexity.
Our theoretical results are consistent with the current best bounds when degenerated to standard settings.
- Score: 42.06479169761205
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In the field of online sequential decision-making, we address the problem
with delays utilizing the framework of online convex optimization (OCO), where
the feedback of a decision can arrive with an unknown delay. Unlike previous
research that is limited to Euclidean norm and gradient information, we propose
three families of delayed algorithms based on approximate solutions to handle
different types of received feedback. Our proposed algorithms are versatile and
applicable to universal norms. Specifically, we introduce a family of Follow
the Delayed Regularized Leader algorithms for feedback with full information on
the loss function, a family of Delayed Mirror Descent algorithms for feedback
with gradient information on the loss function and a family of Simplified
Delayed Mirror Descent algorithms for feedback with the value information of
the loss function's gradients at corresponding decision points. For each type
of algorithm, we provide corresponding regret bounds under cases of general
convexity and relative strong convexity, respectively. We also demonstrate the
efficiency of each algorithm under different norms through concrete examples.
Furthermore, our theoretical results are consistent with the current best
bounds when degenerated to standard settings.
Related papers
- Error Feedback under $(L_0,L_1)$-Smoothness: Normalization and Momentum [56.37522020675243]
We provide the first proof of convergence for normalized error feedback algorithms across a wide range of machine learning problems.
We show that due to their larger allowable stepsizes, our new normalized error feedback algorithms outperform their non-normalized counterparts on various tasks.
arXiv Detail & Related papers (2024-10-22T10:19:27Z) - Distributed Online Bandit Nonconvex Optimization with One-Point Residual Feedback via Dynamic Regret [10.700891331004799]
This paper considers the distributed online bandit optimization problem with non loss functions over a time-varying digraph.
Players select an adversary and then the adversary assigns an arbitrary non-linear loss function to this player.
The expected regret of our algorithms is comparable to existing algorithms that use two-point deviation.
arXiv Detail & Related papers (2024-09-24T02:37:33Z) - A Mirror Descent-Based Algorithm for Corruption-Tolerant Distributed Gradient Descent [57.64826450787237]
We show how to analyze the behavior of distributed gradient descent algorithms in the presence of adversarial corruptions.
We show how to use ideas from (lazy) mirror descent to design a corruption-tolerant distributed optimization algorithm.
Experiments based on linear regression, support vector classification, and softmax classification on the MNIST dataset corroborate our theoretical findings.
arXiv Detail & Related papers (2024-07-19T08:29:12Z) - A Generalized Approach to Online Convex Optimization [33.38582292895673]
We show that any algorithm for online linear optimization with fully adaptive adversaries is an algorithm for online convex optimization.
We show that any such algorithm that requires full-information feedback may be transformed to an algorithm with semi-bandit feedback with comparable regret bound.
arXiv Detail & Related papers (2024-02-13T17:42:27Z) - Handling Delayed Feedback in Distributed Online Optimization : A
Projection-Free Approach [1.9797215742507548]
Learning at the edges has become increasingly important as large quantities of data are continually generated locally.
We propose two projection-free algorithms for centralised and distributed settings in which they are carefully designed to achieve a regret bound of O(sqrtB) where B is the sum of delay.
We provide an extensive theoretical study and experimentally validate the performance of our algorithms by comparing them with existing ones on real-world problems.
arXiv Detail & Related papers (2024-02-03T10:43:22Z) - Large-Scale Sequential Learning for Recommender and Engineering Systems [91.3755431537592]
In this thesis, we focus on the design of an automatic algorithms that provide personalized ranking by adapting to the current conditions.
For the former, we propose novel algorithm called SAROS that take into account both kinds of feedback for learning over the sequence of interactions.
The proposed idea of taking into account the neighbour lines shows statistically significant results in comparison with the initial approach for faults detection in power grid.
arXiv Detail & Related papers (2022-05-13T21:09:41Z) - Solving Inverse Problems by Joint Posterior Maximization with
Autoencoding Prior [0.0]
We address the problem of solving ill-posed inverse problems in imaging where the prior is a JPal autoencoder (VAE)
We show that our technique is quite sufficient that it satisfies the proposed objective function.
Results also show the robustness of our approach to provide more robust estimates.
arXiv Detail & Related papers (2021-03-02T11:18:34Z) - A closer look at temporal variability in dynamic online learning [19.468067110814808]
This work focuses on the setting of dynamic regret in the context of online learning with full information.
By assuming that the sequence of loss functions does not vary much with time, we show that it is possible to incur improved regret bounds compared to existing results.
arXiv Detail & Related papers (2021-02-15T16:50:16Z) - An Asymptotically Optimal Primal-Dual Incremental Algorithm for
Contextual Linear Bandits [129.1029690825929]
We introduce a novel algorithm improving over the state-of-the-art along multiple dimensions.
We establish minimax optimality for any learning horizon in the special case of non-contextual linear bandits.
arXiv Detail & Related papers (2020-10-23T09:12:47Z) - Adaptive Sampling for Best Policy Identification in Markov Decision
Processes [79.4957965474334]
We investigate the problem of best-policy identification in discounted Markov Decision (MDPs) when the learner has access to a generative model.
The advantages of state-of-the-art algorithms are discussed and illustrated.
arXiv Detail & Related papers (2020-09-28T15:22:24Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.