Tree ensemble kernels for Bayesian optimization with known constraints
over mixed-feature spaces
- URL: http://arxiv.org/abs/2207.00879v1
- Date: Sat, 2 Jul 2022 16:59:37 GMT
- Title: Tree ensemble kernels for Bayesian optimization with known constraints
over mixed-feature spaces
- Authors: Alexander Thebelt, Calvin Tsay, Robert M. Lee, Nathan Sudermann-Merx,
David Walz, Behrang Shafei, Ruth Misener
- Abstract summary: Tree ensembles can be well-suited for black-box optimization tasks such as algorithm tuning and neural architecture search.
Two well-known challenges in using tree ensembles for black-box optimization are (i) effectively quantifying model uncertainty for exploration and (ii) optimizing over the piece-wise constant acquisition function.
Our framework performs as well as state-of-the-art methods for unconstrained black-box optimization over continuous/discrete features and outperforms competing methods for problems combining mixed-variable feature spaces and known input constraints.
- Score: 54.58348769621782
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Tree ensembles can be well-suited for black-box optimization tasks such as
algorithm tuning and neural architecture search, as they achieve good
predictive performance with little to no manual tuning, naturally handle
discrete feature spaces, and are relatively insensitive to outliers in the
training data. Two well-known challenges in using tree ensembles for black-box
optimization are (i) effectively quantifying model uncertainty for exploration
and (ii) optimizing over the piece-wise constant acquisition function. To
address both points simultaneously, we propose using the kernel interpretation
of tree ensembles as a Gaussian Process prior to obtain model variance
estimates, and we develop a compatible optimization formulation for the
acquisition function. The latter further allows us to seamlessly integrate
known constraints to improve sampling efficiency by considering
domain-knowledge in engineering settings and modeling search space symmetries,
e.g., hierarchical relationships in neural architecture search. Our framework
performs as well as state-of-the-art methods for unconstrained black-box
optimization over continuous/discrete features and outperforms competing
methods for problems combining mixed-variable feature spaces and known input
constraints.
Related papers
- A Continuous Relaxation for Discrete Bayesian Optimization [17.312618575552]
We show that inference and optimization can be computationally tractable.
We consider in particular the optimization domain where very few observations and strict budgets exist.
We show that the resulting acquisition function can be optimized with both continuous or discrete optimization algorithms.
arXiv Detail & Related papers (2024-04-26T14:47:40Z) - SequentialAttention++ for Block Sparsification: Differentiable Pruning
Meets Combinatorial Optimization [24.55623897747344]
Neural network pruning is a key technique towards engineering large yet scalable, interpretable, generalizable models.
We show how many existing differentiable pruning techniques can be understood as non regularization for group sparse optimization.
We propose SequentialAttention++, which advances state the art in large-scale neural network block-wise pruning tasks on the ImageNet and Criteo datasets.
arXiv Detail & Related papers (2024-02-27T21:42:18Z) - Enhancing Gaussian Process Surrogates for Optimization and Posterior Approximation via Random Exploration [2.984929040246293]
novel noise-free Bayesian optimization strategies that rely on a random exploration step to enhance the accuracy of Gaussian process surrogate models.
New algorithms retain the ease of implementation of the classical GP-UCB, but an additional exploration step facilitates their convergence.
arXiv Detail & Related papers (2024-01-30T14:16:06Z) - Ensemble-based Hybrid Optimization of Bayesian Neural Networks and
Traditional Machine Learning Algorithms [0.0]
This research introduces a novel methodology for optimizing Bayesian Neural Networks (BNNs) by synergistically integrating them with traditional machine learning algorithms such as Random Forests (RF), Gradient Boosting (GB), and Support Vector Machines (SVM)
Feature integration solidifies these results by emphasizing the second-order conditions for optimality, including stationarity and positive definiteness of the Hessian matrix.
Overall, the ensemble method stands out as a robust, algorithmically optimized approach.
arXiv Detail & Related papers (2023-10-09T06:59:17Z) - Accelerated First-Order Optimization under Nonlinear Constraints [73.2273449996098]
We exploit between first-order algorithms for constrained optimization and non-smooth systems to design a new class of accelerated first-order algorithms.
An important property of these algorithms is that constraints are expressed in terms of velocities instead of sparse variables.
arXiv Detail & Related papers (2023-02-01T08:50:48Z) - Efficient Methods for Structured Nonconvex-Nonconcave Min-Max
Optimization [98.0595480384208]
We propose a generalization extraient spaces which converges to a stationary point.
The algorithm applies not only to general $p$-normed spaces, but also to general $p$-dimensional vector spaces.
arXiv Detail & Related papers (2020-10-31T21:35:42Z) - Additive Tree-Structured Covariance Function for Conditional Parameter
Spaces in Bayesian Optimization [34.89735938765757]
We generalize the additive assumption to tree-structured functions.
By incorporating the structure information of parameter spaces and the additive assumption in the BO loop, we develop a parallel algorithm to optimize the acquisition function.
arXiv Detail & Related papers (2020-06-21T11:21:55Z) - Generalized and Scalable Optimal Sparse Decision Trees [56.35541305670828]
We present techniques that produce optimal decision trees over a variety of objectives.
We also introduce a scalable algorithm that produces provably optimal results in the presence of continuous variables.
arXiv Detail & Related papers (2020-06-15T19:00:11Z) - Convergence of adaptive algorithms for weakly convex constrained
optimization [59.36386973876765]
We prove the $mathcaltilde O(t-1/4)$ rate of convergence for the norm of the gradient of Moreau envelope.
Our analysis works with mini-batch size of $1$, constant first and second order moment parameters, and possibly smooth optimization domains.
arXiv Detail & Related papers (2020-06-11T17:43:19Z) - Global Optimization of Gaussian processes [52.77024349608834]
We propose a reduced-space formulation with trained Gaussian processes trained on few data points.
The approach also leads to significantly smaller and computationally cheaper sub solver for lower bounding.
In total, we reduce time convergence by orders of orders of the proposed method.
arXiv Detail & Related papers (2020-05-21T20:59:11Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.