The Environmental Discontinuity Hypothesis for Down-Sampled Lexicase
Selection
- URL: http://arxiv.org/abs/2205.15931v1
- Date: Tue, 31 May 2022 16:21:14 GMT
- Title: The Environmental Discontinuity Hypothesis for Down-Sampled Lexicase
Selection
- Authors: Ryan Boldi, Thomas Helmuth, Lee Spector
- Abstract summary: Down-sampling has proved effective in genetic programming (GP) runs that utilize the lexicase parent selection technique.
We hypothesize that the random sampling that is performed every generation causes discontinuities that result in the population being unable to adapt to the shifting environment.
We find that forcing incremental environmental change is not significantly better for evolving solutions to program synthesis problems than simple random down-sampling.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Down-sampling training data has long been shown to improve the generalization
performance of a wide range of machine learning systems. Recently,
down-sampling has proved effective in genetic programming (GP) runs that
utilize the lexicase parent selection technique. Although this down-sampling
procedure has been shown to significantly improve performance across a variety
of problems, it does not seem to do so due to encouraging adaptability through
environmental change. We hypothesize that the random sampling that is performed
every generation causes discontinuities that result in the population being
unable to adapt to the shifting environment. We investigate modifications to
down-sampled lexicase selection in hopes of promoting incremental environmental
change to scaffold evolution by reducing the amount of jarring discontinuities
between the environments of successive generations. In our empirical studies,
we find that forcing incremental environmental change is not significantly
better for evolving solutions to program synthesis problems than simple random
down-sampling. In response to this, we attempt to exacerbate the hypothesized
prevalence of discontinuities by using only disjoint down-samples to see if it
hinders performance. We find that this also does not significantly differ from
the performance of regular random down-sampling. These negative results raise
new questions about the ways in which the composition of sub-samples, which may
include synonymous cases, may be expected to influence the performance of
machine learning systems that use down-sampling.
Related papers
- Untangling the Effects of Down-Sampling and Selection in Genetic Programming [40.05141985769286]
Genetic programming systems often use large training sets to evaluate the quality of candidate solutions for selection.
Recent studies have shown that both random and informed down-sampling can substantially improve problem-solving success.
arXiv Detail & Related papers (2023-04-14T12:21:19Z) - Reweighted Mixup for Subpopulation Shift [63.1315456651771]
Subpopulation shift exists in many real-world applications, which refers to the training and test distributions that contain the same subpopulation groups but with different subpopulation proportions.
Importance reweighting is a classical and effective way to handle the subpopulation shift.
We propose a simple yet practical framework, called reweighted mixup, to mitigate the overfitting issue.
arXiv Detail & Related papers (2023-04-09T03:44:50Z) - A Static Analysis of Informed Down-Samples [62.997667081978825]
We study recorded populations from the first generation of genetic programming runs, as well as entirely synthetic populations.
We show that both forms of down-sampling cause greater test coverage loss than standard lexicase selection with no down-sampling.
arXiv Detail & Related papers (2023-04-04T17:34:48Z) - Informed Down-Sampled Lexicase Selection: Identifying productive
training cases for efficient problem solving [40.683810697551166]
Genetic Programming (GP) often uses large training sets and requires all individuals to be evaluated on all training cases during selection.
Random down-sampled lexicase selection evaluates individuals on only a random subset of the training cases allowing for more individuals to be explored with the same amount of program executions.
In Informed Down-Sampled Lexicase Selection, we use population statistics to build down-samples that contain more distinct and therefore informative training cases.
arXiv Detail & Related papers (2023-01-04T08:47:18Z) - Intra-class Adaptive Augmentation with Neighbor Correction for Deep
Metric Learning [99.14132861655223]
We propose a novel intra-class adaptive augmentation (IAA) framework for deep metric learning.
We reasonably estimate intra-class variations for every class and generate adaptive synthetic samples to support hard samples mining.
Our method significantly improves and outperforms the state-of-the-art methods on retrieval performances by 3%-6%.
arXiv Detail & Related papers (2022-11-29T14:52:38Z) - UMIX: Improving Importance Weighting for Subpopulation Shift via
Uncertainty-Aware Mixup [44.0372420908258]
Subpopulation shift wildly exists in many real-world machine learning applications.
Importance reweighting is a normal way to handle the subpopulation shift issue.
We propose uncertainty-aware mixup (Umix) to mitigate the overfitting issue.
arXiv Detail & Related papers (2022-09-19T11:22:28Z) - Saliency Grafting: Innocuous Attribution-Guided Mixup with Calibrated
Label Mixing [104.630875328668]
Mixup scheme suggests mixing a pair of samples to create an augmented training sample.
We present a novel, yet simple Mixup-variant that captures the best of both worlds.
arXiv Detail & Related papers (2021-12-16T11:27:48Z) - Problem-solving benefits of down-sampled lexicase selection [0.20305676256390928]
We show that down-sampled lexicase selection's main benefit stems from the fact that it allows the evolutionary process to examine more individuals within the same computational budget.
The reasons that down-sampling helps, however, are not yet fully understood.
arXiv Detail & Related papers (2021-06-10T23:42:09Z) - GANs with Variational Entropy Regularizers: Applications in Mitigating
the Mode-Collapse Issue [95.23775347605923]
Building on the success of deep learning, Generative Adversarial Networks (GANs) provide a modern approach to learn a probability distribution from observed samples.
GANs often suffer from the mode collapse issue where the generator fails to capture all existing modes of the input distribution.
We take an information-theoretic approach and maximize a variational lower bound on the entropy of the generated samples to increase their diversity.
arXiv Detail & Related papers (2020-09-24T19:34:37Z) - Robust Sampling in Deep Learning [62.997667081978825]
Deep learning requires regularization mechanisms to reduce overfitting and improve generalization.
We address this problem by a new regularization method based on distributional robust optimization.
During the training, the selection of samples is done according to their accuracy in such a way that the worst performed samples are the ones that contribute the most in the optimization.
arXiv Detail & Related papers (2020-06-04T09:46:52Z) - Minority Class Oversampling for Tabular Data with Deep Generative Models [4.976007156860967]
We study the ability of deep generative models to provide realistic samples that improve performance on imbalanced classification tasks via oversampling.
Our experiments show that the way the method of sampling does not affect quality, but runtime varies widely.
We also observe that the improvements in terms of performance metric, while shown to be significant, often are minor in absolute terms.
arXiv Detail & Related papers (2020-05-07T21:35:57Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.