Related papers: Sampling in CMA-ES: Low Numbers of Low Discrepancy Points

Sampling in CMA-ES: Low Numbers of Low Discrepancy Points

URL: http://arxiv.org/abs/2409.15941v1
Date: Tue, 24 Sep 2024 10:04:55 GMT
Title: Sampling in CMA-ES: Low Numbers of Low Discrepancy Points
Authors: Jacob de Nobel, Diederick Vermetten, Thomas H. W. Bäck, Anna V. Kononova,
Abstract summary: We show that iterating through small, fixed sets of low-discrepancy points can still perform better than the default uniform distribution. For lower dimensionalities, we find that using as little as 32 unique discrepancy points performs similar or better than uniform sampling.
Score: 0.0
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: The Covariance Matrix Adaptation Evolution Strategy (CMA-ES) is one of the most successful examples of a derandomized evolution strategy. However, it still relies on randomly sampling offspring, which can be done via a uniform distribution and subsequently transforming into the required Gaussian. Previous work has shown that replacing this uniform sampling with a low-discrepancy sampler, such as Halton or Sobol sequences, can improve performance over a wide set of problems. We show that iterating through small, fixed sets of low-discrepancy points can still perform better than the default uniform distribution. Moreover, using only 128 points throughout the search is sufficient to closely approximate the empirical performance of using the complete pseudorandom sequence up to dimensionality 40 on the BBOB benchmark. For lower dimensionalities (below 10), we find that using as little as 32 unique low discrepancy points performs similar or better than uniform sampling. In 2D, for which we have highly optimized low discrepancy samples available, we demonstrate that using these points yields the highest empirical performance and requires only 16 samples to improve over uniform sampling. Overall, we establish a clear relation between the $L_2$ discrepancy of the used point set and the empirical performance of the CMA-ES.

Related papers

Adaptive Inference-Time Compute: LLMs Can Predict if They Can Do Better, Even Mid-Generation [51.127054971591924]
We introduce a new generative self-evaluation scheme designed to adaptively reduce the number of generated samples. We demonstrate that 74% of the improvement from using 16 samples can be achieved with only 1.2 samples on average.
arXiv Detail & Related papers (2024-10-03T17:47:29Z)
Large Language Monkeys: Scaling Inference Compute with Repeated Sampling [81.34900892130929]
We explore inference compute as another axis for scaling, using the simple technique of repeatedly sampling candidate solutions from a model. Across multiple tasks and models, we observe that coverage scales with the number of samples over four orders of magnitude. In domains like coding and formal proofs, where answers can be automatically verified, these increases in coverage directly translate into improved performance.
arXiv Detail & Related papers (2024-07-31T17:57:25Z)
Faster Diffusion Sampling with Randomized Midpoints: Sequential and Parallel [10.840582511203024]
We show that our algorithm can be parallelized to run in only $widetilde O(log2 d)$ parallel rounds. We also show that our algorithm can be parallelized to run in only $widetilde O(log2 d)$ parallel rounds.
arXiv Detail & Related papers (2024-06-03T01:34:34Z)
Mini-batch Submodular Maximization [5.439020425819001]
We present the first mini-batch algorithm for maximizing a monotone decomposable submodular function, $F=sum_i=1N fi$, under a set of constraints. We consider two sampling approaches: uniform and weighted. Surprisingly, our experimental results show that uniform sampling is superior to weighted sampling.
arXiv Detail & Related papers (2024-01-23T04:16:58Z)
Optimal Multi-Distribution Learning [88.3008613028333]
Multi-distribution learning seeks to learn a shared model that minimizes the worst-case risk across $k$ distinct data distributions. We propose a novel algorithm that yields an varepsilon-optimal randomized hypothesis with a sample complexity on the order of (d+k)/varepsilon2.
arXiv Detail & Related papers (2023-12-08T16:06:29Z)
Stochastic Approximation Approaches to Group Distributionally Robust Optimization and Beyond [89.72693227960274]
This paper investigates group distributionally robust optimization (GDRO) with the goal of learning a model that performs well over $m$ different distributions. To reduce the number of samples in each round from $m$ to 1, we cast GDRO as a two-player game, where one player conducts and the other executes an online algorithm for non-oblivious multi-armed bandits. In the second scenario, we propose to optimize the average top-$k$ risk instead of the maximum risk, thereby mitigating the impact of distributions.
arXiv Detail & Related papers (2023-02-18T09:24:15Z)
Adaptive Sketches for Robust Regression with Importance Sampling [64.75899469557272]
We introduce data structures for solving robust regression through gradient descent (SGD) Our algorithm effectively runs $T$ steps of SGD with importance sampling while using sublinear space and just making a single pass over the data.
arXiv Detail & Related papers (2022-07-16T03:09:30Z)
Undersampling is a Minimax Optimal Robustness Intervention in Nonparametric Classification [28.128464387420216]
We show that learning is fundamentally constrained by a lack of minority group samples. In particular, in the case of label shift we show that there is always an undersampling algorithm that is minimax optimal.
arXiv Detail & Related papers (2022-05-26T00:35:11Z)
Conditional Poisson Stochastic Beam Search [35.60062127942947]
Conditional Poisson beam search (CPSBS) is a more natural alternative to Kool et. al. 2019's beam search (SBS) CPSBS produces lower variance and more efficient estimators than SBS, even showing improvements in high entropy settings.
arXiv Detail & Related papers (2021-09-22T20:49:16Z)
Local policy search with Bayesian optimization [73.0364959221845]
Reinforcement learning aims to find an optimal policy by interaction with an environment. Policy gradients for local search are often obtained from random perturbations. We develop an algorithm utilizing a probabilistic model of the objective function and its gradient.
arXiv Detail & Related papers (2021-06-22T16:07:02Z)
On the Comparison between Cyclic Sampling and Random Reshuffling [27.27774034428748]
Cyclic sampling draws the samples in a fixed, cyclic order, which is less robust than reshuffling the samples periodically. Existing works have established worst case convergence rates for cyclic sampling, which are generally worse than that of random reshuffling. In this paper, we found a certain cyclic order can be much faster than reshuffling and one can discover it at a low cost.
arXiv Detail & Related papers (2021-04-25T09:29:43Z)
Bandit Samplers for Training Graph Neural Networks [63.17765191700203]
Several sampling algorithms with variance reduction have been proposed for accelerating the training of Graph Convolution Networks (GCNs) These sampling algorithms are not applicable to more general graph neural networks (GNNs) where the message aggregator contains learned weights rather than fixed weights, such as Graph Attention Networks (GAT)
arXiv Detail & Related papers (2020-06-10T12:48:37Z)

This list is automatically generated from the titles and abstracts of the papers in this site.