Related papers: Time-varying Gaussian Process Bandit Optimization with Non-constant Evaluation Time

Time-varying Gaussian Process Bandit Optimization with Non-constant Evaluation Time

URL: http://arxiv.org/abs/2003.04691v2
Date: Wed, 11 Mar 2020 00:38:08 GMT
Title: Time-varying Gaussian Process Bandit Optimization with Non-constant Evaluation Time
Authors: Hideaki Imamura, Nontawat Charoenphakdee, Futoshi Futami, Issei Sato, Junya Honda, Masashi Sugiyama
Abstract summary: We propose a novel time-varying Bayesian optimization algorithm that can effectively handle the non-constant evaluation time. Our bound elucidates that a pattern of the evaluation time sequence can hugely affect the difficulty of the problem.
Score: 93.6788993843846
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: The Gaussian process bandit is a problem in which we want to find a maximizer of a black-box function with the minimum number of function evaluations. If the black-box function varies with time, then time-varying Bayesian optimization is a promising framework. However, a drawback with current methods is in the assumption that the evaluation time for every observation is constant, which can be unrealistic for many practical applications, e.g., recommender systems and environmental monitoring. As a result, the performance of current methods can be degraded when this assumption is violated. To cope with this problem, we propose a novel time-varying Bayesian optimization algorithm that can effectively handle the non-constant evaluation time. Furthermore, we theoretically establish a regret bound of our algorithm. Our bound elucidates that a pattern of the evaluation time sequence can hugely affect the difficulty of the problem. We also provide experimental results to validate the practical effectiveness of the proposed method.

Related papers

Pathwise optimization for bridge-type estimators and its applications [49.1574468325115]
Pathwise methods allow to efficiently compute the full path for penalized estimators. We apply these algorithms to the penalized estimation of processes observed at discrete times.
arXiv Detail & Related papers (2024-12-05T10:38:29Z)
Time-Series-Informed Closed-loop Learning for Sequential Decision Making and Control [0.0]
Traditional Bayesian optimization approaches treat the learning problem as a black box, ignoring valuable information and knowledge about the structure of the underlying problem. We propose a time-series-informed optimization framework that incorporates intermediate performance evaluations from early iterations of each experimental episode into the learning procedure. We show that our approach achieves baseline performance with approximately half the resources and outperforms the baseline in terms of final closed-loop performance.
arXiv Detail & Related papers (2024-12-03T12:38:53Z)
Accelerated zero-order SGD under high-order smoothness and overparameterized regime [79.85163929026146]
We present a novel gradient-free algorithm to solve convex optimization problems. Such problems are encountered in medicine, physics, and machine learning. We provide convergence guarantees for the proposed algorithm under both types of noise.
arXiv Detail & Related papers (2024-11-21T10:26:17Z)
Time-Varying Gaussian Process Bandits with Unknown Prior [18.93478528448966]
PE-GP-UCB is capable of solving time-varying Bayesian optimisation problems. It relies on the fact that either the observed function values are consistent with some of the priors.
arXiv Detail & Related papers (2024-02-02T18:52:16Z)
Primal-Dual Contextual Bayesian Optimization for Control System Online Optimization with Time-Average Constraints [21.38692458445459]
This paper studies the problem of online performance optimization of constrained closed-loop control systems. A primal-dual contextual Bayesian optimization algorithm is proposed that achieves sublinear cumulative regret with respect to the dynamic optimal solution.
arXiv Detail & Related papers (2023-04-12T18:37:52Z)
Maximum-Likelihood Inverse Reinforcement Learning with Finite-Time Guarantees [56.848265937921354]
Inverse reinforcement learning (IRL) aims to recover the reward function and the associated optimal policy. Many algorithms for IRL have an inherently nested structure. We develop a novel single-loop algorithm for IRL that does not compromise reward estimation accuracy.
arXiv Detail & Related papers (2022-10-04T17:13:45Z)
Bayesian Optimization under Stochastic Delayed Feedback [36.16843889404038]
Existing BO methods assume that the function evaluation (feedback) is available to the learner immediately or after a fixed delay. We propose algorithms with sub-linear regret guarantees that efficiently address the dilemma of selecting new function queries while waiting for delayed feedback.
arXiv Detail & Related papers (2022-06-19T07:34:08Z)
Predictor-corrector algorithms for stochastic optimization under gradual distribution shift [26.897316325189212]
Time-varying optimization problems frequently arise in machine learning practice. We exploit this underlying continuity by developing predictor-corrector algorithms for time-varying optimizations.
arXiv Detail & Related papers (2022-05-26T18:33:00Z)
A Framework for Sample Efficient Interval Estimation with Control Variates [94.32811054797148]
We consider the problem of estimating confidence intervals for the mean of a random variable. Under certain conditions, we show improved efficiency compared to existing estimation algorithms.
arXiv Detail & Related papers (2020-06-18T05:42:30Z)
Exploiting Higher Order Smoothness in Derivative-free Optimization and Continuous Bandits [99.70167985955352]
We study the problem of zero-order optimization of a strongly convex function. We consider a randomized approximation of the projected gradient descent algorithm. Our results imply that the zero-order algorithm is nearly optimal in terms of sample complexity and the problem parameters.
arXiv Detail & Related papers (2020-06-14T10:42:23Z)
Optimizing for the Future in Non-Stationary MDPs [52.373873622008944]
We present a policy gradient algorithm that maximizes a forecast of future performance. We show that our algorithm, called Prognosticator, is more robust to non-stationarity than two online adaptation techniques.
arXiv Detail & Related papers (2020-05-17T03:41:19Z)
Active Learning for Identification of Linear Dynamical Systems [12.056495277232118]
We show a finite time bound estimation rate our algorithm attains. We analyze several examples where our algorithm provably improves over rates obtained by playing noise.
arXiv Detail & Related papers (2020-02-02T21:30:38Z)

This list is automatically generated from the titles and abstracts of the papers in this site.