A Static Analysis of Informed Down-Samples
- URL: http://arxiv.org/abs/2304.01978v2
- Date: Mon, 17 Apr 2023 00:00:36 GMT
- Title: A Static Analysis of Informed Down-Samples
- Authors: Ryan Boldi, Alexander Lalejini, Thomas Helmuth, Lee Spector
- Abstract summary: We study recorded populations from the first generation of genetic programming runs, as well as entirely synthetic populations.
We show that both forms of down-sampling cause greater test coverage loss than standard lexicase selection with no down-sampling.
- Score: 62.997667081978825
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We present an analysis of the loss of population-level test coverage induced
by different down-sampling strategies when combined with lexicase selection. We
study recorded populations from the first generation of genetic programming
runs, as well as entirely synthetic populations. Our findings verify the
hypothesis that informed down-sampling better maintains population-level test
coverage when compared to random down-sampling. Additionally, we show that both
forms of down-sampling cause greater test coverage loss than standard lexicase
selection with no down-sampling. However, given more information about the
population, we found that informed down-sampling can further reduce its test
coverage loss. We also recommend wider adoption of the static population
analyses we present in this work.
Related papers
- Detecting Adversarial Data by Probing Multiple Perturbations Using
Expected Perturbation Score [62.54911162109439]
Adversarial detection aims to determine whether a given sample is an adversarial one based on the discrepancy between natural and adversarial distributions.
We propose a new statistic called expected perturbation score (EPS), which is essentially the expected score of a sample after various perturbations.
We develop EPS-based maximum mean discrepancy (MMD) as a metric to measure the discrepancy between the test sample and natural samples.
arXiv Detail & Related papers (2023-05-25T13:14:58Z) - Analyzing the Interaction Between Down-Sampling and Selection [52.77024349608834]
Genetic programming systems often use large training sets to evaluate the quality of candidate solutions for selection.
Down-sampling training sets has long been used to decrease the computational cost of evaluation in a wide range of application domains.
arXiv Detail & Related papers (2023-04-14T12:21:19Z) - Informed Down-Sampled Lexicase Selection: Identifying productive
training cases for efficient problem solving [40.683810697551166]
Genetic Programming (GP) often uses large training sets and requires all individuals to be evaluated on all training cases during selection.
Random down-sampled lexicase selection evaluates individuals on only a random subset of the training cases allowing for more individuals to be explored with the same amount of program executions.
In Informed Down-Sampled Lexicase Selection, we use population statistics to build down-samples that contain more distinct and therefore informative training cases.
arXiv Detail & Related papers (2023-01-04T08:47:18Z) - Learning with Noisy Labels over Imbalanced Subpopulations [13.477553187049462]
Learning with noisy labels (LNL) has attracted significant attention from the research community.
We propose a novel LNL method to simultaneously deal with noisy labels and imbalanced subpopulations.
We introduce a feature-based metric that takes the sample correlation into account for estimating samples' clean probabilities.
arXiv Detail & Related papers (2022-11-16T07:25:24Z) - Statistical and Computational Phase Transitions in Group Testing [73.55361918807883]
We study the group testing problem where the goal is to identify a set of k infected individuals carrying a rare disease.
We consider two different simple random procedures for assigning individuals tests.
arXiv Detail & Related papers (2022-06-15T16:38:50Z) - The Environmental Discontinuity Hypothesis for Down-Sampled Lexicase
Selection [0.0]
Down-sampling has proved effective in genetic programming (GP) runs that utilize the lexicase parent selection technique.
We hypothesize that the random sampling that is performed every generation causes discontinuities that result in the population being unable to adapt to the shifting environment.
We find that forcing incremental environmental change is not significantly better for evolving solutions to program synthesis problems than simple random down-sampling.
arXiv Detail & Related papers (2022-05-31T16:21:14Z) - Saliency Grafting: Innocuous Attribution-Guided Mixup with Calibrated
Label Mixing [104.630875328668]
Mixup scheme suggests mixing a pair of samples to create an augmented training sample.
We present a novel, yet simple Mixup-variant that captures the best of both worlds.
arXiv Detail & Related papers (2021-12-16T11:27:48Z) - Compressing Large Sample Data for Discriminant Analysis [78.12073412066698]
We consider the computational issues due to large sample size within the discriminant analysis framework.
We propose a new compression approach for reducing the number of training samples for linear and quadratic discriminant analysis.
arXiv Detail & Related papers (2020-05-08T05:09:08Z) - UGRWO-Sampling for COVID-19 dataset: A modified random walk
under-sampling approach based on graphs to imbalanced data classification [2.15242029196761]
This paper proposes a new RWO-Sampling (Random Walk Over-Sampling) based on graphs for imbalanced datasets.
Two schemes based on under-sampling and over-sampling methods are introduced to keep the proximity information robust to noises and outliers.
arXiv Detail & Related papers (2020-02-10T03:29:24Z) - Overly Optimistic Prediction Results on Imbalanced Data: a Case Study of
Flaws and Benefits when Applying Over-sampling [13.463035357173045]
We focus on one specific type of methodological flaw: applying over-sampling before partitioning the data into mutually exclusive training and testing sets.
We show how this causes the results to be biased using two artificial datasets and reproduce results of studies in which this flaw was identified.
arXiv Detail & Related papers (2020-01-15T12:53:23Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.