Informed Down-Sampled Lexicase Selection: Identifying productive
training cases for efficient problem solving
- URL: http://arxiv.org/abs/2301.01488v2
- Date: Thu, 22 Feb 2024 14:48:19 GMT
- Title: Informed Down-Sampled Lexicase Selection: Identifying productive
training cases for efficient problem solving
- Authors: Ryan Boldi, Martin Briesch, Dominik Sobania, Alexander Lalejini,
Thomas Helmuth, Franz Rothlauf, Charles Ofria, Lee Spector
- Abstract summary: Genetic Programming (GP) often uses large training sets and requires all individuals to be evaluated on all training cases during selection.
Random down-sampled lexicase selection evaluates individuals on only a random subset of the training cases allowing for more individuals to be explored with the same amount of program executions.
In Informed Down-Sampled Lexicase Selection, we use population statistics to build down-samples that contain more distinct and therefore informative training cases.
- Score: 40.683810697551166
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Genetic Programming (GP) often uses large training sets and requires all
individuals to be evaluated on all training cases during selection. Random
down-sampled lexicase selection evaluates individuals on only a random subset
of the training cases allowing for more individuals to be explored with the
same amount of program executions. However, creating a down-sample randomly
might exclude important cases from the current down-sample for a number of
generations, while cases that measure the same behavior (synonymous cases) may
be overused despite their redundancy. In this work, we introduce Informed
Down-Sampled Lexicase Selection. This method leverages population statistics to
build down-samples that contain more distinct and therefore informative
training cases. Through an empirical investigation across two different GP
systems (PushGP and Grammar-Guided GP), we find that informed down-sampling
significantly outperforms random down-sampling on a set of contemporary program
synthesis benchmark problems. Through an analysis of the created down-samples,
we find that important training cases are included in the down-sample
consistently across independent evolutionary runs and systems. We hypothesize
that this improvement can be attributed to the ability of Informed Down-Sampled
Lexicase Selection to maintain more specialist individuals over the course of
evolution, while also benefiting from reduced per-evaluation costs.
Related papers
- Diversified Batch Selection for Training Acceleration [68.67164304377732]
A prevalent research line, known as online batch selection, explores selecting informative subsets during the training process.
vanilla reference-model-free methods involve independently scoring and selecting data in a sample-wise manner.
We propose Diversified Batch Selection (DivBS), which is reference-model-free and can efficiently select diverse and representative samples.
arXiv Detail & Related papers (2024-06-07T12:12:20Z) - Untangling the Effects of Down-Sampling and Selection in Genetic Programming [40.05141985769286]
Genetic programming systems often use large training sets to evaluate the quality of candidate solutions for selection.
Recent studies have shown that both random and informed down-sampling can substantially improve problem-solving success.
arXiv Detail & Related papers (2023-04-14T12:21:19Z) - A Static Analysis of Informed Down-Samples [62.997667081978825]
We study recorded populations from the first generation of genetic programming runs, as well as entirely synthetic populations.
We show that both forms of down-sampling cause greater test coverage loss than standard lexicase selection with no down-sampling.
arXiv Detail & Related papers (2023-04-04T17:34:48Z) - Learning from a Biased Sample [3.546358664345473]
We propose a method for learning a decision rule that minimizes the worst-case risk incurred under a family of test distributions.
We empirically validate our proposed method in a case study on prediction of mental health scores from health survey data.
arXiv Detail & Related papers (2022-09-05T04:19:16Z) - The Environmental Discontinuity Hypothesis for Down-Sampled Lexicase
Selection [0.0]
Down-sampling has proved effective in genetic programming (GP) runs that utilize the lexicase parent selection technique.
We hypothesize that the random sampling that is performed every generation causes discontinuities that result in the population being unable to adapt to the shifting environment.
We find that forcing incremental environmental change is not significantly better for evolving solutions to program synthesis problems than simple random down-sampling.
arXiv Detail & Related papers (2022-05-31T16:21:14Z) - Transfer Learning In Differential Privacy's Hybrid-Model [10.584333748643774]
We study the problem of machine learning in the hybrid-model where the n individuals in the curators dataset are drawn from a different distribution.
We give a general scheme -- Subsample-Test-Reweigh -- for this transfer learning problem.
arXiv Detail & Related papers (2022-01-28T09:54:54Z) - Problem-solving benefits of down-sampled lexicase selection [0.20305676256390928]
We show that down-sampled lexicase selection's main benefit stems from the fact that it allows the evolutionary process to examine more individuals within the same computational budget.
The reasons that down-sampling helps, however, are not yet fully understood.
arXiv Detail & Related papers (2021-06-10T23:42:09Z) - One for More: Selecting Generalizable Samples for Generalizable ReID
Model [92.40951770273972]
This paper proposes a one-for-more training objective that takes the generalization ability of selected samples as a loss function.
Our proposed one-for-more based sampler can be seamlessly integrated into the ReID training framework.
arXiv Detail & Related papers (2020-12-10T06:37:09Z) - Tracking disease outbreaks from sparse data with Bayesian inference [55.82986443159948]
The COVID-19 pandemic provides new motivation for estimating the empirical rate of transmission during an outbreak.
Standard methods struggle to accommodate the partial observability and sparse data common at finer scales.
We propose a Bayesian framework which accommodates partial observability in a principled manner.
arXiv Detail & Related papers (2020-09-12T20:37:33Z) - Improving Maximum Likelihood Training for Text Generation with Density
Ratio Estimation [51.091890311312085]
We propose a new training scheme for auto-regressive sequence generative models, which is effective and stable when operating at large sample space encountered in text generation.
Our method stably outperforms Maximum Likelihood Estimation and other state-of-the-art sequence generative models in terms of both quality and diversity.
arXiv Detail & Related papers (2020-07-12T15:31:24Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.