Related papers: Online Sequential Decision-Making with Unknown Delays

Online Sequential Decision-Making with Unknown Delays

URL: http://arxiv.org/abs/2402.07703v3
Date: Fri, 23 Feb 2024 06:05:19 GMT
Title: Online Sequential Decision-Making with Unknown Delays
Authors: Ping Wu and Heyan Huang and Zhengyang Liu
Abstract summary: We propose three families of delayed algorithms based on approximate solutions to handle different types of received feedback. For each type of algorithm, we provide corresponding regret bounds under cases of general convexity and relative strong convexity. Our theoretical results are consistent with the current best bounds when degenerated to standard settings.
Score: 42.06479169761205
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: In the field of online sequential decision-making, we address the problem with delays utilizing the framework of online convex optimization (OCO), where the feedback of a decision can arrive with an unknown delay. Unlike previous research that is limited to Euclidean norm and gradient information, we propose three families of delayed algorithms based on approximate solutions to handle different types of received feedback. Our proposed algorithms are versatile and applicable to universal norms. Specifically, we introduce a family of Follow the Delayed Regularized Leader algorithms for feedback with full information on the loss function, a family of Delayed Mirror Descent algorithms for feedback with gradient information on the loss function and a family of Simplified Delayed Mirror Descent algorithms for feedback with the value information of the loss function's gradients at corresponding decision points. For each type of algorithm, we provide corresponding regret bounds under cases of general convexity and relative strong convexity, respectively. We also demonstrate the efficiency of each algorithm under different norms through concrete examples. Furthermore, our theoretical results are consistent with the current best bounds when degenerated to standard settings.

Related papers

Error Feedback under $(L_0,L_1)$-Smoothness: Normalization and Momentum [56.37522020675243]
We provide the first proof of convergence for normalized error feedback algorithms across a wide range of machine learning problems. We show that due to their larger allowable stepsizes, our new normalized error feedback algorithms outperform their non-normalized counterparts on various tasks.
arXiv Detail & Related papers (2024-10-22T10:19:27Z)
Distributed Online Bandit Nonconvex Optimization with One-Point Residual Feedback via Dynamic Regret [10.700891331004799]
This paper considers the distributed online bandit optimization problem with non loss functions over a time-varying digraph. Players select an adversary and then the adversary assigns an arbitrary non-linear loss function to this player. The expected regret of our algorithms is comparable to existing algorithms that use two-point deviation.
arXiv Detail & Related papers (2024-09-24T02:37:33Z)
A Mirror Descent-Based Algorithm for Corruption-Tolerant Distributed Gradient Descent [57.64826450787237]
We show how to analyze the behavior of distributed gradient descent algorithms in the presence of adversarial corruptions. We show how to use ideas from (lazy) mirror descent to design a corruption-tolerant distributed optimization algorithm. Experiments based on linear regression, support vector classification, and softmax classification on the MNIST dataset corroborate our theoretical findings.
arXiv Detail & Related papers (2024-07-19T08:29:12Z)
A Unified Framework for Analyzing Meta-algorithms in Online Convex Optimization [33.38582292895673]
We show that any algorithm for online linear optimization with fully adaptive adversaries is an algorithm for online convex optimization. We use this to describe general metaalgorithms to convert deterministic algorithms to zeroth order algorithms with comparable regret bounds.
arXiv Detail & Related papers (2024-02-13T17:42:27Z)
Handling Delayed Feedback in Distributed Online Optimization : A Projection-Free Approach [1.9797215742507548]
Learning at the edges has become increasingly important as large quantities of data are continually generated locally. We propose two projection-free algorithms for centralised and distributed settings in which they are carefully designed to achieve a regret bound of O(sqrtB) where B is the sum of delay. We provide an extensive theoretical study and experimentally validate the performance of our algorithms by comparing them with existing ones on real-world problems.
arXiv Detail & Related papers (2024-02-03T10:43:22Z)
Large-Scale Sequential Learning for Recommender and Engineering Systems [91.3755431537592]
In this thesis, we focus on the design of an automatic algorithms that provide personalized ranking by adapting to the current conditions. For the former, we propose novel algorithm called SAROS that take into account both kinds of feedback for learning over the sequence of interactions. The proposed idea of taking into account the neighbour lines shows statistically significant results in comparison with the initial approach for faults detection in power grid.
arXiv Detail & Related papers (2022-05-13T21:09:41Z)
Solving Inverse Problems by Joint Posterior Maximization with Autoencoding Prior [0.0]
We address the problem of solving ill-posed inverse problems in imaging where the prior is a JPal autoencoder (VAE) We show that our technique is quite sufficient that it satisfies the proposed objective function. Results also show the robustness of our approach to provide more robust estimates.
arXiv Detail & Related papers (2021-03-02T11:18:34Z)
A closer look at temporal variability in dynamic online learning [19.468067110814808]
This work focuses on the setting of dynamic regret in the context of online learning with full information. By assuming that the sequence of loss functions does not vary much with time, we show that it is possible to incur improved regret bounds compared to existing results.
arXiv Detail & Related papers (2021-02-15T16:50:16Z)
An Asymptotically Optimal Primal-Dual Incremental Algorithm for Contextual Linear Bandits [129.1029690825929]
We introduce a novel algorithm improving over the state-of-the-art along multiple dimensions. We establish minimax optimality for any learning horizon in the special case of non-contextual linear bandits.
arXiv Detail & Related papers (2020-10-23T09:12:47Z)
Adaptive Sampling for Best Policy Identification in Markov Decision Processes [79.4957965474334]
We investigate the problem of best-policy identification in discounted Markov Decision (MDPs) when the learner has access to a generative model. The advantages of state-of-the-art algorithms are discussed and illustrated.
arXiv Detail & Related papers (2020-09-28T15:22:24Z)

This list is automatically generated from the titles and abstracts of the papers in this site.