Procrastinated Tree Search: Black-box Optimization with Delayed, Noisy,
and Multi-fidelity Feedback
- URL: http://arxiv.org/abs/2110.07232v1
- Date: Thu, 14 Oct 2021 08:55:41 GMT
- Title: Procrastinated Tree Search: Black-box Optimization with Delayed, Noisy,
and Multi-fidelity Feedback
- Authors: Junxiong Wang, Debabrota Basu, Immanuel Trummer
- Abstract summary: In black-box optimization problems, we aim to maximize an unknown objective function, where the function is only accessible through feedbacks of an evaluation or simulation oracle.
We propose a generic extension of hierarchical optimistic tree search (HOO), called ProCrastinated Tree Search (PCTS)
We provide a generic proof technique to quantify regret of PCTS under delayed, noisy, and multi-fidelity feedbacks.
- Score: 11.064341598231195
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: In black-box optimization problems, we aim to maximize an unknown objective
function, where the function is only accessible through feedbacks of an
evaluation or simulation oracle. In real-life, the feedbacks of such oracles
are often noisy and available after some unknown delay that may depend on the
computation time of the oracle. Additionally, if the exact evaluations are
expensive but coarse approximations are available at a lower cost, the
feedbacks can have multi-fidelity. In order to address this problem, we propose
a generic extension of hierarchical optimistic tree search (HOO), called
ProCrastinated Tree Search (PCTS), that flexibly accommodates a delay and
noise-tolerant bandit algorithm. We provide a generic proof technique to
quantify regret of PCTS under delayed, noisy, and multi-fidelity feedbacks.
Specifically, we derive regret bounds of PCTS enabled with delayed-UCB1 (DUCB1)
and delayed-UCB-V (DUCBV) algorithms. Given a horizon $T$, PCTS retains the
regret bound of non-delayed HOO for expected delay of $O(\log T)$ and worsens
by $O(T^{\frac{1-\alpha}{d+2}})$ for expected delays of $O(T^{1-\alpha})$ for
$\alpha \in (0,1]$. We experimentally validate on multiple synthetic functions
and hyperparameter tuning problems that PCTS outperforms the state-of-the-art
black-box optimization methods for feedbacks with different noise levels,
delays, and fidelity.
Related papers
- Tree Search-Based Policy Optimization under Stochastic Execution Delay [46.849634120584646]
Delayed execution MDPs are a new formalism addressing random delays without resorting to state augmentation.
We show that given observed delay values, it is sufficient to perform a policy search in the class of Markov policies.
We devise DEZ, a model-based algorithm that optimize over the class of Markov policies.
arXiv Detail & Related papers (2024-04-08T12:19:04Z) - Stochastic Approximation with Delayed Updates: Finite-Time Rates under Markovian Sampling [73.5602474095954]
We study the non-asymptotic performance of approximation schemes with delayed updates under Markovian sampling.
Our theoretical findings shed light on the finite-time effects of delays for a broad class of algorithms.
arXiv Detail & Related papers (2024-02-19T03:08:02Z) - Improved Regret for Bandit Convex Optimization with Delayed Feedback [50.46856739179311]
bandit convex optimization (BCO) with delayed feedback, where only the loss value of the action is revealed under a delay.
We develop a novel algorithm, and prove that it enjoys a regret bound of $O(sqrtnT3/4+sqrtdT)$ in general.
We show that the proposed algorithm can improve the regret bound to $O((nT)2/3log/3T+dlog T)$ for strongly convex functions.
arXiv Detail & Related papers (2024-02-14T13:08:26Z) - Posterior Sampling with Delayed Feedback for Reinforcement Learning with
Linear Function Approximation [62.969796245827006]
Delayed-PSVI is an optimistic value-based algorithm that explores the value function space via noise perturbation with posterior sampling.
We show our algorithm achieves $widetildeO(sqrtd3H3 T + d2H2 E[tau]$ worst-case regret in the presence of unknown delays.
We incorporate a gradient-based approximate sampling scheme via Langevin dynamics for Delayed-LPSVI.
arXiv Detail & Related papers (2023-10-29T06:12:43Z) - Min-Max Optimization under Delays [26.830212508878162]
Delays and asynchrony are inevitable in large-scale machine-learning problems.
No analogous theory is available for min-max optimization.
We show that even small delays can cause prominent algorithms like Extra-gradient to diverge.
arXiv Detail & Related papers (2023-07-13T16:39:01Z) - Non-stationary Online Convex Optimization with Arbitrary Delays [50.46856739179311]
This paper investigates the delayed online convex optimization (OCO) in non-stationary environments.
We first propose a simple algorithm, namely DOGD, which performs a gradient descent step for each delayed gradient according to their arrival order.
We develop an improved algorithm, which reduces those dynamic regret bounds achieved by DOGD to $O(sqrtbardT(P_T+1))$.
arXiv Detail & Related papers (2023-05-20T07:54:07Z) - Bayesian Optimization under Stochastic Delayed Feedback [36.16843889404038]
Existing BO methods assume that the function evaluation (feedback) is available to the learner immediately or after a fixed delay.
We propose algorithms with sub-linear regret guarantees that efficiently address the dilemma of selecting new function queries while waiting for delayed feedback.
arXiv Detail & Related papers (2022-06-19T07:34:08Z) - Asynchronous Stochastic Optimization Robust to Arbitrary Delays [54.61797739710608]
We consider optimization with delayed gradients where, at each time stept$, the algorithm makes an update using a stale computation - d_t$ for arbitrary delay $d_t gradient.
Our experiments demonstrate the efficacy and robustness of our algorithm in cases where the delay distribution is skewed or heavy-tailed.
arXiv Detail & Related papers (2021-06-22T15:50:45Z) - Online Strongly Convex Optimization with Unknown Delays [30.931538196386672]
We investigate the problem of online convex optimization with unknown delays.
We first extend the delayed variant of OGD for strongly convex functions.
We establish a better regret bound of $O(dlog T)$, where $d$ is the maximum delay.
arXiv Detail & Related papers (2021-03-21T10:16:15Z) - Adapting to Delays and Data in Adversarial Multi-Armed Bandits [7.310043452300736]
We analyze variants of the Exp3 algorithm that tune their step-size using only information available at the time of the decisions.
We obtain regret guarantees that adapt to the observed (rather than the worst-case) sequences of delays and/or losses.
arXiv Detail & Related papers (2020-10-12T20:53:52Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.