A Static Analysis of Informed Down-Samples
- URL: http://arxiv.org/abs/2304.01978v2
- Date: Mon, 17 Apr 2023 00:00:36 GMT
- Title: A Static Analysis of Informed Down-Samples
- Authors: Ryan Boldi, Alexander Lalejini, Thomas Helmuth, Lee Spector
- Abstract summary: We study recorded populations from the first generation of genetic programming runs, as well as entirely synthetic populations.
We show that both forms of down-sampling cause greater test coverage loss than standard lexicase selection with no down-sampling.
- Score: 62.997667081978825
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We present an analysis of the loss of population-level test coverage induced
by different down-sampling strategies when combined with lexicase selection. We
study recorded populations from the first generation of genetic programming
runs, as well as entirely synthetic populations. Our findings verify the
hypothesis that informed down-sampling better maintains population-level test
coverage when compared to random down-sampling. Additionally, we show that both
forms of down-sampling cause greater test coverage loss than standard lexicase
selection with no down-sampling. However, given more information about the
population, we found that informed down-sampling can further reduce its test
coverage loss. We also recommend wider adoption of the static population
analyses we present in this work.
Related papers
- Optimal Downsampling for Imbalanced Classification with Generalized Linear Models [6.14486033794703]
We study optimal downsampling for imbalanced classification using generalized linear models (GLMs)
We propose a pseudo likelihood estimator and study its normality in the context of increasingly imbalanced populations.
arXiv Detail & Related papers (2024-10-11T17:08:13Z) - Detecting Adversarial Data by Probing Multiple Perturbations Using
Expected Perturbation Score [62.54911162109439]
Adversarial detection aims to determine whether a given sample is an adversarial one based on the discrepancy between natural and adversarial distributions.
We propose a new statistic called expected perturbation score (EPS), which is essentially the expected score of a sample after various perturbations.
We develop EPS-based maximum mean discrepancy (MMD) as a metric to measure the discrepancy between the test sample and natural samples.
arXiv Detail & Related papers (2023-05-25T13:14:58Z) - Informed Down-Sampled Lexicase Selection: Identifying productive
training cases for efficient problem solving [40.683810697551166]
Genetic Programming (GP) often uses large training sets and requires all individuals to be evaluated on all training cases during selection.
Random down-sampled lexicase selection evaluates individuals on only a random subset of the training cases allowing for more individuals to be explored with the same amount of program executions.
In Informed Down-Sampled Lexicase Selection, we use population statistics to build down-samples that contain more distinct and therefore informative training cases.
arXiv Detail & Related papers (2023-01-04T08:47:18Z) - Statistical and Computational Phase Transitions in Group Testing [73.55361918807883]
We study the group testing problem where the goal is to identify a set of k infected individuals carrying a rare disease.
We consider two different simple random procedures for assigning individuals tests.
arXiv Detail & Related papers (2022-06-15T16:38:50Z) - The Environmental Discontinuity Hypothesis for Down-Sampled Lexicase
Selection [0.0]
Down-sampling has proved effective in genetic programming (GP) runs that utilize the lexicase parent selection technique.
We hypothesize that the random sampling that is performed every generation causes discontinuities that result in the population being unable to adapt to the shifting environment.
We find that forcing incremental environmental change is not significantly better for evolving solutions to program synthesis problems than simple random down-sampling.
arXiv Detail & Related papers (2022-05-31T16:21:14Z) - Saliency Grafting: Innocuous Attribution-Guided Mixup with Calibrated
Label Mixing [104.630875328668]
Mixup scheme suggests mixing a pair of samples to create an augmented training sample.
We present a novel, yet simple Mixup-variant that captures the best of both worlds.
arXiv Detail & Related papers (2021-12-16T11:27:48Z) - Tracking disease outbreaks from sparse data with Bayesian inference [55.82986443159948]
The COVID-19 pandemic provides new motivation for estimating the empirical rate of transmission during an outbreak.
Standard methods struggle to accommodate the partial observability and sparse data common at finer scales.
We propose a Bayesian framework which accommodates partial observability in a principled manner.
arXiv Detail & Related papers (2020-09-12T20:37:33Z) - Compressing Large Sample Data for Discriminant Analysis [78.12073412066698]
We consider the computational issues due to large sample size within the discriminant analysis framework.
We propose a new compression approach for reducing the number of training samples for linear and quadratic discriminant analysis.
arXiv Detail & Related papers (2020-05-08T05:09:08Z) - UGRWO-Sampling for COVID-19 dataset: A modified random walk
under-sampling approach based on graphs to imbalanced data classification [2.15242029196761]
This paper proposes a new RWO-Sampling (Random Walk Over-Sampling) based on graphs for imbalanced datasets.
Two schemes based on under-sampling and over-sampling methods are introduced to keep the proximity information robust to noises and outliers.
arXiv Detail & Related papers (2020-02-10T03:29:24Z) - Overly Optimistic Prediction Results on Imbalanced Data: a Case Study of
Flaws and Benefits when Applying Over-sampling [13.463035357173045]
We focus on one specific type of methodological flaw: applying over-sampling before partitioning the data into mutually exclusive training and testing sets.
We show how this causes the results to be biased using two artificial datasets and reproduce results of studies in which this flaw was identified.
arXiv Detail & Related papers (2020-01-15T12:53:23Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.