Are Random Decompositions all we need in High Dimensional Bayesian
Optimisation?
- URL: http://arxiv.org/abs/2301.12844v2
- Date: Mon, 29 May 2023 13:59:15 GMT
- Title: Are Random Decompositions all we need in High Dimensional Bayesian
Optimisation?
- Authors: Juliusz Ziomek, Haitham Bou-Ammar
- Abstract summary: We find that data-driven learners of decompositions can be easily misled towards local decompositions that do not hold globally.
A random tree-based decomposition sampler exhibits favourable theoretical guarantees that effectively trade off maximal information gain and functional mismatch between the actual black-box and its surrogate.
Those results motivate the development of the random decomposition upper-confidence bound algorithm (RDUCB) that is straightforward to implement - (almost) plug-and-play.
- Score: 4.31133019086586
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Learning decompositions of expensive-to-evaluate black-box functions promises
to scale Bayesian optimisation (BO) to high-dimensional problems. However, the
success of these techniques depends on finding proper decompositions that
accurately represent the black-box. While previous works learn those
decompositions based on data, we investigate data-independent decomposition
sampling rules in this paper. We find that data-driven learners of
decompositions can be easily misled towards local decompositions that do not
hold globally across the search space. Then, we formally show that a random
tree-based decomposition sampler exhibits favourable theoretical guarantees
that effectively trade off maximal information gain and functional mismatch
between the actual black-box and its surrogate as provided by the
decomposition. Those results motivate the development of the random
decomposition upper-confidence bound algorithm (RDUCB) that is straightforward
to implement - (almost) plug-and-play - and, surprisingly, yields significant
empirical gains compared to the previous state-of-the-art on a comprehensive
set of benchmarks. We also confirm the plug-and-play nature of our modelling
component by integrating our method with HEBO, showing improved practical gains
in the highest dimensional tasks from Bayesmark.
Related papers
- Self-Boost via Optimal Retraining: An Analysis via Approximate Message Passing [58.52119063742121]
Retraining a model using its own predictions together with the original, potentially noisy labels is a well-known strategy for improving the model performance.<n>This paper addresses the question of how to optimally combine the model's predictions and the provided labels.<n>Our main contribution is the derivation of the Bayes optimal aggregator function to combine the current model's predictions and the given labels.
arXiv Detail & Related papers (2025-05-21T07:16:44Z) - Hierarchical Integration Diffusion Model for Realistic Image Deblurring [71.76410266003917]
Diffusion models (DMs) have been introduced in image deblurring and exhibited promising performance.
We propose the Hierarchical Integration Diffusion Model (HI-Diff), for realistic image deblurring.
Experiments on synthetic and real-world blur datasets demonstrate that our HI-Diff outperforms state-of-the-art methods.
arXiv Detail & Related papers (2023-05-22T12:18:20Z) - Flag Aggregator: Scalable Distributed Training under Failures and
Augmented Losses using Convex Optimization [14.732408788010313]
ML applications increasingly rely on complex deep learning models and large datasets.
To scale computation and data, these models are inevitably trained in a distributed manner in clusters of nodes, and their updates are aggregated before being applied to the model.
With data augmentation added to these settings, there is a critical need for robust and efficient aggregation systems.
We show that our approach significantly enhances the robustness of state-of-the-art Byzantine resilient aggregators.
arXiv Detail & Related papers (2023-02-12T06:38:30Z) - Search for Concepts: Discovering Visual Concepts Using Direct
Optimization [48.51514897866221]
We show that using direct optimization is more generalizable, misses fewer correct decompositions, and typically requires less data than methods based on amortized inference.
This highlights a weakness of the current prevalent practice of using amortized inference that can potentially be improved by integrating more direct optimization elements.
arXiv Detail & Related papers (2022-10-25T15:55:24Z) - Langevin Monte Carlo for Contextual Bandits [72.00524614312002]
Langevin Monte Carlo Thompson Sampling (LMC-TS) is proposed to directly sample from the posterior distribution in contextual bandits.
We prove that the proposed algorithm achieves the same sublinear regret bound as the best Thompson sampling algorithms for a special case of contextual bandits.
arXiv Detail & Related papers (2022-06-22T17:58:23Z) - Kronecker Decomposition for Knowledge Graph Embeddings [5.49810117202384]
We propose a technique based on Kronecker decomposition to reduce the number of parameters in a knowledge graph embedding model.
The decomposition ensures that elementwise interactions between three embedding vectors are extended with interactions within each embedding vector.
Our experiments suggest that applying Kronecker decomposition on embedding matrices leads to an improved parameter efficiency on all benchmark datasets.
arXiv Detail & Related papers (2022-05-13T11:11:03Z) - Distributed Dynamic Safe Screening Algorithms for Sparse Regularization [73.85961005970222]
We propose a new distributed dynamic safe screening (DDSS) method for sparsity regularized models and apply it on shared-memory and distributed-memory architecture respectively.
We prove that the proposed method achieves the linear convergence rate with lower overall complexity and can eliminate almost all the inactive features in a finite number of iterations almost surely.
arXiv Detail & Related papers (2022-04-23T02:45:55Z) - C$^{4}$Net: Contextual Compression and Complementary Combination Network
for Salient Object Detection [0.0]
We show that feature concatenation works better than other combination methods like multiplication or addition.
Also, joint feature learning gives better results, because of the information sharing during their processing.
arXiv Detail & Related papers (2021-10-22T16:14:10Z) - Sparse PCA via $l_{2,p}$-Norm Regularization for Unsupervised Feature
Selection [138.97647716793333]
We propose a simple and efficient unsupervised feature selection method, by combining reconstruction error with $l_2,p$-norm regularization.
We present an efficient optimization algorithm to solve the proposed unsupervised model, and analyse the convergence and computational complexity of the algorithm theoretically.
arXiv Detail & Related papers (2020-12-29T04:08:38Z) - Gaussian MRF Covariance Modeling for Efficient Black-Box Adversarial
Attacks [86.88061841975482]
We study the problem of generating adversarial examples in a black-box setting, where we only have access to a zeroth order oracle.
We use this setting to find fast one-step adversarial attacks, akin to a black-box version of the Fast Gradient Sign Method(FGSM)
We show that the method uses fewer queries and achieves higher attack success rates than the current state of the art.
arXiv Detail & Related papers (2020-10-08T18:36:51Z) - Multi-Person Pose Estimation with Enhanced Feature Aggregation and
Selection [33.15192824888279]
We propose a novel Enhanced Feature Aggregation and Selection network (EFASNet) for multi-person 2D human pose estimation.
Our method can well handle crowded, cluttered and occluded scenes.
Comprehensive experiments demonstrate that the proposed approach outperforms the state-of-the-art methods.
arXiv Detail & Related papers (2020-03-20T08:33:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.