Scaling Gaussian Process Optimization by Evaluating a Few Unique
Candidates Multiple Times
- URL: http://arxiv.org/abs/2201.12909v1
- Date: Sun, 30 Jan 2022 20:42:14 GMT
- Title: Scaling Gaussian Process Optimization by Evaluating a Few Unique
Candidates Multiple Times
- Authors: Daniele Calandriello, Luigi Carratino, Alessandro Lazaric, Michal
Valko, Lorenzo Rosasco
- Abstract summary: We show that sequential black-box optimization based on GPs can be made efficient by sticking to a candidate solution for multiple evaluation steps.
We modify two well-established GP-Opt algorithms, GP-UCB and GP-EI to adapt rules from batched GP-Opt.
- Score: 119.41129787351092
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Computing a Gaussian process (GP) posterior has a computational cost cubical
in the number of historical points. A reformulation of the same GP posterior
highlights that this complexity mainly depends on how many \emph{unique}
historical points are considered. This can have important implication in active
learning settings, where the set of historical points is constructed
sequentially by the learner. We show that sequential black-box optimization
based on GPs (GP-Opt) can be made efficient by sticking to a candidate solution
for multiple evaluation steps and switch only when necessary. Limiting the
number of switches also limits the number of unique points in the history of
the GP. Thus, the efficient GP reformulation can be used to exactly and cheaply
compute the posteriors required to run the GP-Opt algorithms. This approach is
especially useful in real-world applications of GP-Opt with high switch costs
(e.g. switching chemicals in wet labs, data/model loading in hyperparameter
optimization). As examples of this meta-approach, we modify two
well-established GP-Opt algorithms, GP-UCB and GP-EI, to switch candidates as
infrequently as possible adapting rules from batched GP-Opt. These versions
preserve all the theoretical no-regret guarantees while improving practical
aspects of the algorithms such as runtime, memory complexity, and the ability
of batching candidates and evaluating them in parallel.
Related papers
- Domain Invariant Learning for Gaussian Processes and Bayesian
Exploration [39.83530605880014]
We propose a domain invariant learning algorithm for Gaussian processes (DIL-GP) with a min-max optimization on the likelihood.
Numerical experiments demonstrate the superiority of DIL-GP for predictions on several synthetic and real-world datasets.
arXiv Detail & Related papers (2023-12-18T16:13:34Z) - Revisiting Active Sets for Gaussian Process Decoders [0.0]
We develop a new estimate of the log-marginal likelihood based on recently discovered links to cross-validation.
We demonstrate that the resulting active sets (SAS) approximation significantly improves the robustness of GP decoder training.
arXiv Detail & Related papers (2022-09-10T10:49:31Z) - Sparse Kernel Gaussian Processes through Iterative Charted Refinement
(ICR) [0.0]
We present a new, generative method named Iterative Charted Refinement (ICR) to model Gaussian Processes.
ICR represents long- and short-range correlations by combining views of the modeled locations at varying resolutions with a user-provided coordinate chart.
ICR outperforms existing methods in terms of computational speed by one order of magnitude on the CPU and GPU.
arXiv Detail & Related papers (2022-06-21T18:00:01Z) - Shallow and Deep Nonparametric Convolutions for Gaussian Processes [0.0]
We introduce a nonparametric process convolution formulation for GPs that alleviates weaknesses by using a functional sampling approach.
We propose a composition of these nonparametric convolutions that serves as an alternative to classic deep GP models.
arXiv Detail & Related papers (2022-06-17T19:03:04Z) - Surrogate modeling for Bayesian optimization beyond a single Gaussian
process [62.294228304646516]
We propose a novel Bayesian surrogate model to balance exploration with exploitation of the search space.
To endow function sampling with scalability, random feature-based kernel approximation is leveraged per GP model.
To further establish convergence of the proposed EGP-TS to the global optimum, analysis is conducted based on the notion of Bayesian regret.
arXiv Detail & Related papers (2022-05-27T16:43:10Z) - Fast Gaussian Process Posterior Mean Prediction via Local Cross
Validation and Precomputation [0.0]
We present a fast posterior mean prediction algorithm called FastMuyGPs.
It is based upon the MuyGPs hyper parameter estimation algorithm and utilizes a combination of leave-one-out cross-validation, nearest neighbors sparsification, and precomputation.
It attains superior accuracy and competitive or superior runtime to both deep neural networks and state-of-the-art GP algorithms.
arXiv Detail & Related papers (2022-05-22T17:38:36Z) - Robust and Adaptive Temporal-Difference Learning Using An Ensemble of
Gaussian Processes [70.80716221080118]
The paper takes a generative perspective on policy evaluation via temporal-difference (TD) learning.
The OS-GPTD approach is developed to estimate the value function for a given policy by observing a sequence of state-reward pairs.
To alleviate the limited expressiveness associated with a single fixed kernel, a weighted ensemble (E) of GP priors is employed to yield an alternative scheme.
arXiv Detail & Related papers (2021-12-01T23:15:09Z) - Non-Gaussian Gaussian Processes for Few-Shot Regression [71.33730039795921]
We propose an invertible ODE-based mapping that operates on each component of the random variable vectors and shares the parameters across all of them.
NGGPs outperform the competing state-of-the-art approaches on a diversified set of benchmarks and applications.
arXiv Detail & Related papers (2021-10-26T10:45:25Z) - Incremental Ensemble Gaussian Processes [53.3291389385672]
We propose an incremental ensemble (IE-) GP framework, where an EGP meta-learner employs an it ensemble of GP learners, each having a unique kernel belonging to a prescribed kernel dictionary.
With each GP expert leveraging the random feature-based approximation to perform online prediction and model update with it scalability, the EGP meta-learner capitalizes on data-adaptive weights to synthesize the per-expert predictions.
The novel IE-GP is generalized to accommodate time-varying functions by modeling structured dynamics at the EGP meta-learner and within each GP learner.
arXiv Detail & Related papers (2021-10-13T15:11:25Z) - Near-linear Time Gaussian Process Optimization with Adaptive Batching
and Resparsification [119.41129787351092]
We introduce BBKB, the first no-regret GP optimization algorithm that provably runs in near-linear time and selects candidates in batches.
We show that the same bound can be used to adaptively delay costly updates to the sparse GP approximation, achieving a near-constant per-step amortized cost.
arXiv Detail & Related papers (2020-02-23T17:43:29Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.