Analyzing the Interaction Between Down-Sampling and Selection
- URL: http://arxiv.org/abs/2304.07089v1
- Date: Fri, 14 Apr 2023 12:21:19 GMT
- Title: Analyzing the Interaction Between Down-Sampling and Selection
- Authors: Ryan Boldi, Ashley Bao, Martin Briesch, Thomas Helmuth, Dominik
Sobania, Lee Spector, Alexander Lalejini
- Abstract summary: Genetic programming systems often use large training sets to evaluate the quality of candidate solutions for selection.
Down-sampling training sets has long been used to decrease the computational cost of evaluation in a wide range of application domains.
- Score: 52.77024349608834
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Genetic programming systems often use large training sets to evaluate the
quality of candidate solutions for selection. However, evaluating populations
on large training sets can be computationally expensive. Down-sampling training
sets has long been used to decrease the computational cost of evaluation in a
wide range of application domains. Indeed, recent studies have shown that both
random and informed down-sampling can substantially improve problem-solving
success for GP systems that use the lexicase parent selection algorithm. We use
the PushGP framework to experimentally test whether these down-sampling
techniques can also improve problem-solving success in the context of two other
commonly used selection methods, fitness-proportionate and tournament
selection, across eight GP problems (four program synthesis and four symbolic
regression). We verified that down-sampling can benefit the problem-solving
success of both fitness-proportionate and tournament selection. However, the
number of problems wherein down-sampling improved problem-solving success
varied by selection scheme, suggesting that the impact of down-sampling depends
both on the problem and choice of selection scheme. Surprisingly, we found that
down-sampling was most consistently beneficial when combined with lexicase
selection as compared to tournament and fitness-proportionate selection.
Overall, our results suggest that down-sampling should be considered more often
when solving test-based GP problems.
Related papers
- Was Tournament Selection All We Ever Needed? A Critical Reflection on Lexicase Selection [0.7874708385247353]
We run experiments comparing epsilon-lexicase and tournament selection with different down-sampling techniques.
We find that down-sampling improves generalization and performance even when compared over the same number of generations.
We observe that population diversity increases for tournament selection when combined with down-sampling.
arXiv Detail & Related papers (2025-02-25T11:01:11Z) - Lexicase-based Selection Methods with Down-sampling for Symbolic Regression Problems: Overview and Benchmark [0.8602553195689513]
This paper evaluates random as well as informed down-sampling in combination with the relevant lexicase-based selection methods on a wide range of symbolic regression problems.
We find that for a given evaluation budget, epsilon-lexicase selection in combination with random or informed down-sampling outperforms all other methods.
arXiv Detail & Related papers (2024-07-31T14:26:22Z) - Diversified Batch Selection for Training Acceleration [68.67164304377732]
A prevalent research line, known as online batch selection, explores selecting informative subsets during the training process.
vanilla reference-model-free methods involve independently scoring and selecting data in a sample-wise manner.
We propose Diversified Batch Selection (DivBS), which is reference-model-free and can efficiently select diverse and representative samples.
arXiv Detail & Related papers (2024-06-07T12:12:20Z) - Learning Fair Policies for Multi-stage Selection Problems from
Observational Data [4.282745020665833]
We consider the problem of learning fair policies for multi-stage selection problems from observational data.
This problem arises in several high-stakes domains such as company hiring, loan approval, or bail decisions where outcomes are only observed for those selected.
We propose a multi-stage framework that can be augmented with various fairness constraints, such as demographic parity or equal opportunity.
arXiv Detail & Related papers (2023-12-20T16:33:15Z) - Selecting Learnable Training Samples is All DETRs Need in Crowded
Pedestrian Detection [72.97320260601347]
In crowded pedestrian detection, the performance of DETRs is still unsatisfactory due to the inappropriate sample selection method.
We propose Sample Selection for Crowded Pedestrians, which consists of the constraint-guided label assignment scheme (CGLA)
Experimental results show that the proposed SSCP effectively improves the baselines without introducing any overhead in inference.
arXiv Detail & Related papers (2023-05-18T08:28:01Z) - In Search of Insights, Not Magic Bullets: Towards Demystification of the
Model Selection Dilemma in Heterogeneous Treatment Effect Estimation [92.51773744318119]
This paper empirically investigates the strengths and weaknesses of different model selection criteria.
We highlight that there is a complex interplay between selection strategies, candidate estimators and the data used for comparing them.
arXiv Detail & Related papers (2023-02-06T16:55:37Z) - Informed Down-Sampled Lexicase Selection: Identifying productive
training cases for efficient problem solving [40.683810697551166]
Genetic Programming (GP) often uses large training sets and requires all individuals to be evaluated on all training cases during selection.
Random down-sampled lexicase selection evaluates individuals on only a random subset of the training cases allowing for more individuals to be explored with the same amount of program executions.
In Informed Down-Sampled Lexicase Selection, we use population statistics to build down-samples that contain more distinct and therefore informative training cases.
arXiv Detail & Related papers (2023-01-04T08:47:18Z) - The Environmental Discontinuity Hypothesis for Down-Sampled Lexicase
Selection [0.0]
Down-sampling has proved effective in genetic programming (GP) runs that utilize the lexicase parent selection technique.
We hypothesize that the random sampling that is performed every generation causes discontinuities that result in the population being unable to adapt to the shifting environment.
We find that forcing incremental environmental change is not significantly better for evolving solutions to program synthesis problems than simple random down-sampling.
arXiv Detail & Related papers (2022-05-31T16:21:14Z) - HardVis: Visual Analytics to Handle Instance Hardness Using Undersampling and Oversampling Techniques [48.82319198853359]
HardVis is a visual analytics system designed to handle instance hardness mainly in imbalanced classification scenarios.
Users can explore subsets of data from different perspectives to decide all those parameters.
The efficacy and effectiveness of HardVis are demonstrated with a hypothetical usage scenario and a use case.
arXiv Detail & Related papers (2022-03-29T17:04:16Z) - Problem-solving benefits of down-sampled lexicase selection [0.20305676256390928]
We show that down-sampled lexicase selection's main benefit stems from the fact that it allows the evolutionary process to examine more individuals within the same computational budget.
The reasons that down-sampling helps, however, are not yet fully understood.
arXiv Detail & Related papers (2021-06-10T23:42:09Z) - Bloom Origami Assays: Practical Group Testing [90.2899558237778]
Group testing is a well-studied problem with several appealing solutions.
Recent biological studies impose practical constraints for COVID-19 that are incompatible with traditional methods.
We develop a new method combining Bloom filters with belief propagation to scale to larger values of n (more than 100) with good empirical results.
arXiv Detail & Related papers (2020-07-21T19:31:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.