A Distributional View of High Dimensional Optimization
- URL: http://arxiv.org/abs/2507.16315v1
- Date: Tue, 22 Jul 2025 07:56:01 GMT
- Title: A Distributional View of High Dimensional Optimization
- Authors: Felix Benning,
- Abstract summary: We motivate this view with an investigation of the failure point of classical optimization.<n>We consider the optimization of a randomly drawn objective function.<n>It further turns out that this distributional view provides insights into optimal step size control of gradient descent.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This PhD thesis presents a distributional view of optimization in place of a worst-case perspective. We motivate this view with an investigation of the failure point of classical optimization. Subsequently we consider the optimization of a randomly drawn objective function. This is the setting of Bayesian Optimization. After a review of Bayesian optimization we outline how such a distributional view may explain predictable progress of optimization in high dimension. It further turns out that this distributional view provides insights into optimal step size control of gradient descent. To enable these results, we develop mathematical tools to deal with random input to random functions and a characterization of non-stationary isotropic covariance kernels. Finally, we outline how assumptions about the data, specifically exchangability, can lead to random objective functions in machine learning and analyze their landscape.
Related papers
- Information-theoretic Bayesian Optimization: Survey and Tutorial [2.3931689873603603]
This paper is about the information theoretical acquisition functions, whose performance typically outperforms the rest acquisition functions.<n>We also cover how information theory acquisition functions can be adapted to complex optimization scenarios such as the multi-objective, constrained, non-myopic, multi-fidelity, parallel and asynchronous.
arXiv Detail & Related papers (2025-01-22T10:54:15Z) - Local Bayesian optimization via maximizing probability of descent [26.82385325186729]
Local optimization is a promising approach to expensive, high-dimensional black-box optimization.
We show that, surprisingly, the expected value of the gradient is not always the direction maximizing the probability of descent.
This observation inspires an elegant optimization scheme seeking to maximize the probability of descent while moving in the direction of most-probable descent.
arXiv Detail & Related papers (2022-10-21T01:13:14Z) - Generalizing Bayesian Optimization with Decision-theoretic Entropies [102.82152945324381]
We consider a generalization of Shannon entropy from work in statistical decision theory.
We first show that special cases of this entropy lead to popular acquisition functions used in BO procedures.
We then show how alternative choices for the loss yield a flexible family of acquisition functions.
arXiv Detail & Related papers (2022-10-04T04:43:58Z) - Bayesian Optimization with Informative Covariance [13.113313427848828]
We propose novel informative covariance functions for optimization, leveraging nonstationarity to encode preferences for certain regions of the search space.
We demonstrate that the proposed functions can increase the sample efficiency of Bayesian optimization in high dimensions, even under weak prior information.
arXiv Detail & Related papers (2022-08-04T15:05:11Z) - Online Calibrated and Conformal Prediction Improves Bayesian Optimization [10.470326550507117]
This paper studies which uncertainties are needed in model-based decision-making and in Bayesian optimization.
Maintaining calibration, however, can be challenging when the data is non-stationary and depends on our actions.
We propose using simple algorithms based on online learning to provably maintain calibration on non-i.i.d. data.
arXiv Detail & Related papers (2021-12-08T23:26:23Z) - Stochastic Learning Approach to Binary Optimization for Optimal Design
of Experiments [0.0]
We present a novel approach to binary optimization for optimal experimental design (OED) for Bayesian inverse problems governed by mathematical models such as partial differential equations.
The OED utility function, namely, the regularized optimality gradient, is cast into an objective function in the form of an expectation over a Bernoulli distribution.
The objective is then solved by using a probabilistic optimization routine to find an optimal observational policy.
arXiv Detail & Related papers (2021-01-15T03:54:12Z) - Zeroth-Order Hybrid Gradient Descent: Towards A Principled Black-Box
Optimization Framework [100.36569795440889]
This work is on the iteration of zero-th-order (ZO) optimization which does not require first-order information.
We show that with a graceful design in coordinate importance sampling, the proposed ZO optimization method is efficient both in terms of complexity as well as as function query cost.
arXiv Detail & Related papers (2020-12-21T17:29:58Z) - Are we Forgetting about Compositional Optimisers in Bayesian
Optimisation? [66.39551991177542]
This paper presents a sample methodology for global optimisation.
Within this, a crucial performance-determiningtrivial is maximising the acquisition function.
We highlight the empirical advantages of the approach to optimise functionation across 3958 individual experiments.
arXiv Detail & Related papers (2020-12-15T12:18:38Z) - Sequential Subspace Search for Functional Bayesian Optimization
Incorporating Experimenter Intuition [63.011641517977644]
Our algorithm generates a sequence of finite-dimensional random subspaces of functional space spanned by a set of draws from the experimenter's Gaussian Process.
Standard Bayesian optimisation is applied on each subspace, and the best solution found used as a starting point (origin) for the next subspace.
We test our algorithm in simulated and real-world experiments, namely blind function matching, finding the optimal precipitation-strengthening function for an aluminium alloy, and learning rate schedule optimisation for deep networks.
arXiv Detail & Related papers (2020-09-08T06:54:11Z) - Incorporating Expert Prior in Bayesian Optimisation via Space Warping [54.412024556499254]
In big search spaces the algorithm goes through several low function value regions before reaching the optimum of the function.
One approach to subside this cold start phase is to use prior knowledge that can accelerate the optimisation.
In this paper, we represent the prior knowledge about the function optimum through a prior distribution.
The prior distribution is then used to warp the search space in such a way that space gets expanded around the high probability region of function optimum and shrinks around low probability region of optimum.
arXiv Detail & Related papers (2020-03-27T06:18:49Z) - Composition of kernel and acquisition functions for High Dimensional
Bayesian Optimization [0.1749935196721634]
We use the addition-ality of the objective function into mapping both the kernel and the acquisition function of the Bayesian Optimization.
This ap-proach makes more efficient the learning/updating of the probabilistic surrogate model.
Results are presented for real-life application, that is the control of pumps in urban water distribution systems.
arXiv Detail & Related papers (2020-03-09T15:45:57Z) - Distributed Averaging Methods for Randomized Second Order Optimization [54.51566432934556]
We consider distributed optimization problems where forming the Hessian is computationally challenging and communication is a bottleneck.
We develop unbiased parameter averaging methods for randomized second order optimization that employ sampling and sketching of the Hessian.
We also extend the framework of second order averaging methods to introduce an unbiased distributed optimization framework for heterogeneous computing systems.
arXiv Detail & Related papers (2020-02-16T09:01:18Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.