Related papers: Targeted Variance Reduction: Robust Bayesian Optimization of Black-Box Simulators with Noise Parameters

Targeted Variance Reduction: Robust Bayesian Optimization of Black-Box Simulators with Noise Parameters

URL: http://arxiv.org/abs/2403.03816v1
Date: Wed, 6 Mar 2024 16:03:37 GMT
Title: Targeted Variance Reduction: Robust Bayesian Optimization of Black-Box Simulators with Noise Parameters
Authors: John Joshua Miller, Simon Mak
Abstract summary: We propose a new Bayesian optimization method called Targeted Variance Reduction (TVR) TVR leverages a novel joint acquisition function over $(mathbfx,boldsymboltheta)$, which targets variance reduction on the objective within the desired region of improvement. We demonstrate the improved performance of TVR over the state-of-the-art in a suite of numerical experiments and an application to the robust design of automobile brake discs.
Score: 1.7404865362620803
License: http://creativecommons.org/licenses/by/4.0/
Abstract: The optimization of a black-box simulator over control parameters $\mathbf{x}$ arises in a myriad of scientific applications. In such applications, the simulator often takes the form $f(\mathbf{x},\boldsymbol{\theta})$, where $\boldsymbol{\theta}$ are parameters that are uncertain in practice. Robust optimization aims to optimize the objective $\mathbb{E}[f(\mathbf{x},\boldsymbol{\Theta})]$, where $\boldsymbol{\Theta} \sim \mathcal{P}$ is a random variable that models uncertainty on $\boldsymbol{\theta}$. For this, existing black-box methods typically employ a two-stage approach for selecting the next point $(\mathbf{x},\boldsymbol{\theta})$, where $\mathbf{x}$ and $\boldsymbol{\theta}$ are optimized separately via different acquisition functions. As such, these approaches do not employ a joint acquisition over $(\mathbf{x},\boldsymbol{\theta})$, and thus may fail to fully exploit control-to-noise interactions for effective robust optimization. To address this, we propose a new Bayesian optimization method called Targeted Variance Reduction (TVR). The TVR leverages a novel joint acquisition function over $(\mathbf{x},\boldsymbol{\theta})$, which targets variance reduction on the objective within the desired region of improvement. Under a Gaussian process surrogate on $f$, the TVR acquisition can be evaluated in closed form, and reveals an insightful exploration-exploitation-precision trade-off for robust black-box optimization. The TVR can further accommodate a broad class of non-Gaussian distributions on $\mathcal{P}$ via a careful integration of normalizing flows. We demonstrate the improved performance of TVR over the state-of-the-art in a suite of numerical experiments and an application to the robust design of automobile brake discs under operational uncertainty.

Related papers

Fast Debiasing of the LASSO Estimator [3.554868356768806]
In high-dimensional sparse regression, the textscLasso estimator offers excellent theoretical guarantees but is well-known to produce biased estimates.<n>We introduce a method to debias" the textscLasso estimates for a random sub-Gaussian sensing matrix $boldsymbolA$.
arXiv Detail & Related papers (2025-02-27T06:59:17Z)
Allocating Variance to Maximize Expectation [2.25491649634702]
We design efficient approximation algorithms for maximizing the expectation of the supremum of families of Gaussian random variables. Such expectation problems occur in diverse applications, ranging from utility auctions to learning mixture models in quantitative genetics.
arXiv Detail & Related papers (2025-02-25T18:59:46Z)
Displacement-Sparse Neural Optimal Transport [6.968698312185846]
Optimal Transport (OT) theory seeks to determine the map $T:X to Y$ that transports a source measure $P to a target measure $Q$. We introduce a sparsity penalty into the minimax Wasserstein formulation, promote sparsity in displacement vectors $Delta(mathbfx) := T(mathbfx) := T(mathbfx) := T(mathbfx) := T(mathbfx) := T(mathbf
arXiv Detail & Related papers (2025-02-03T23:44:17Z)
Near-Optimal Streaming Heavy-Tailed Statistical Estimation with Clipped SGD [16.019880089338383]
We show that Clipped-SGD, for smooth and strongly convex objectives, achieves an error of $sqrtfracmathsfTr(Sigma)+sqrtmathsfTr(Sigma)+sqrtmathsfTr(Sigma)+sqrtmathsfTr(Sigma)+sqrtmathsfTr(Sigma)+sqrtmathsfTr(Sigma)+sqrtmathsf
arXiv Detail & Related papers (2024-10-26T10:14:17Z)
Two-Timescale Gradient Descent Ascent Algorithms for Nonconvex Minimax Optimization [77.3396841985172]
We provide a unified analysis of two-timescale gradient ascent (TTGDA) for solving structured non minimax optimization problems.<n>Our contribution is to design TTGDA algorithms are effective beyond the setting.
arXiv Detail & Related papers (2024-08-21T20:14:54Z)
Robust Distribution Learning with Local and Global Adversarial Corruptions [17.22168727622332]
We develop an efficient finite-sample algorithm with error bounded by $sqrtvarepsilon k + rho + tildeO(dsqrtkn-1/(k lor 2))$ when $P$ has bounded covariance. Our efficient procedure relies on a novel trace norm approximation of an ideal yet intractable 2-Wasserstein projection estimator.
arXiv Detail & Related papers (2024-06-10T17:48:36Z)
An Oblivious Stochastic Composite Optimization Algorithm for Eigenvalue Optimization Problems [76.2042837251496]
We introduce two oblivious mirror descent algorithms based on a complementary composite setting. Remarkably, both algorithms work without prior knowledge of the Lipschitz constant or smoothness of the objective function. We show how to extend our framework to scale and demonstrate the efficiency and robustness of our methods on large scale semidefinite programs.
arXiv Detail & Related papers (2023-06-30T08:34:29Z)
Estimating the minimizer and the minimum value of a regression function under passive design [72.85024381807466]
We propose a new method for estimating the minimizer $boldsymbolx*$ and the minimum value $f*$ of a smooth and strongly convex regression function $f$. We derive non-asymptotic upper bounds for the quadratic risk and optimization error of $boldsymbolz_n$, and for the risk of estimating $f*$.
arXiv Detail & Related papers (2022-11-29T18:38:40Z)
Random matrices in service of ML footprint: ternary random features with no performance loss [55.30329197651178]
We show that the eigenspectrum of $bf K$ is independent of the distribution of the i.i.d. entries of $bf w$. We propose a novel random technique, called Ternary Random Feature (TRF) The computation of the proposed random features requires no multiplication and a factor of $b$ less bits for storage compared to classical random features.
arXiv Detail & Related papers (2021-10-05T09:33:49Z)
Randomized Exploration for Reinforcement Learning with General Value Function Approximation [122.70803181751135]
We propose a model-free reinforcement learning algorithm inspired by the popular randomized least squares value iteration (RLSVI) algorithm. Our algorithm drives exploration by simply perturbing the training data with judiciously chosen i.i.d. scalar noises. We complement the theory with an empirical evaluation across known difficult exploration tasks.
arXiv Detail & Related papers (2021-06-15T02:23:07Z)
Bayesian Optimistic Optimisation with Exponentially Decaying Regret [58.02542541410322]
The current practical BO algorithms have regret bounds ranging from $mathcalO(fraclogNsqrtN)$ to $mathcal O(e-sqrtN)$, where $N$ is the number of evaluations. This paper explores the possibility of improving the regret bound in the noiseless setting by intertwining concepts from BO and tree-based optimistic optimisation. We propose the BOO algorithm, a first practical approach which can achieve an exponential regret bound with order $mathcal O(N-sqrt
arXiv Detail & Related papers (2021-05-10T13:07:44Z)
Learning to extrapolate using continued fractions: Predicting the critical temperature of superconductor materials [5.905364646955811]
In the field of Artificial Intelligence (AI) and Machine Learning (ML), the approximation of unknown target functions $y=f(mathbfx)$ is a common objective. We refer to $S$ as the training set and aim to identify a low-complexity mathematical model that can effectively approximate this target function for new instances $mathbfx$.
arXiv Detail & Related papers (2020-11-27T04:57:40Z)
Extensions to the Proximal Distance Method of Constrained Optimization [7.813460653362097]
We study the problem of minimizing a loss $f(boldsymbolx)$ subject to constraints of the form $boldsymbolDboldsymbolx in S$. Fusion constraints can capture smoothness, sparsity, or more general constraint patterns.
arXiv Detail & Related papers (2020-09-02T03:32:41Z)
Optimal Robust Linear Regression in Nearly Linear Time [97.11565882347772]
We study the problem of high-dimensional robust linear regression where a learner is given access to $n$ samples from the generative model $Y = langle X,w* rangle + epsilon$ We propose estimators for this problem under two settings: (i) $X$ is L4-L2 hypercontractive, $mathbbE [XXtop]$ has bounded condition number and $epsilon$ has bounded variance and (ii) $X$ is sub-Gaussian with identity second moment and $epsilon$ is
arXiv Detail & Related papers (2020-07-16T06:44:44Z)
Bayesian Optimization of Risk Measures [7.799648230758491]
We consider Bayesian optimization of objective functions of the form $rho[ F(x, W) ]$, where $F$ is a black-box expensive-to-evaluate function. We propose a family of novel Bayesian optimization algorithms that exploit the structure of the objective function to substantially improve sampling efficiency.
arXiv Detail & Related papers (2020-07-10T18:20:46Z)
Greedy Adversarial Equilibrium: An Efficient Alternative to Nonconvex-Nonconcave Min-Max Optimization [28.431572772564518]
We show that Lipschitzitz's $varepsilon$-greedy adversarial model converges from any starting point to a $max_z f(x, z)$. We also show that Lipschitz's $nabla_y f(x,y)$ is in the dimension $d$, $1/varepsilon$, and the bounds on $nabla2_y f(x,y)$ are $nabla2_y.
arXiv Detail & Related papers (2020-06-22T16:03:41Z)
Agnostic Learning of a Single Neuron with Gradient Descent [92.7662890047311]
We consider the problem of learning the best-fitting single neuron as measured by the expected square loss. For the ReLU activation, our population risk guarantee is $O(mathsfOPT1/2)+epsilon$. For the ReLU activation, our population risk guarantee is $O(mathsfOPT1/2)+epsilon$.
arXiv Detail & Related papers (2020-05-29T07:20:35Z)
Stochastic Recursive Gradient Descent Ascent for Stochastic Nonconvex-Strongly-Concave Minimax Problems [36.645753881826955]
In this paper, we propose a novel method called RecurEnti Ascent (SREDA), which estimates more efficiently using variance. This method achieves the best known for this problem.
arXiv Detail & Related papers (2020-01-11T09:05:03Z)

This list is automatically generated from the titles and abstracts of the papers in this site.