Lexicase-based Selection Methods with Down-sampling for Symbolic Regression Problems: Overview and Benchmark
- URL: http://arxiv.org/abs/2407.21632v1
- Date: Wed, 31 Jul 2024 14:26:22 GMT
- Title: Lexicase-based Selection Methods with Down-sampling for Symbolic Regression Problems: Overview and Benchmark
- Authors: Alina Geiger, Dominik Sobania, Franz Rothlauf,
- Abstract summary: This paper evaluates random as well as informed down-sampling in combination with the relevant lexicase-based selection methods on a wide range of symbolic regression problems.
We find that for a given evaluation budget, epsilon-lexicase selection in combination with random or informed down-sampling outperforms all other methods.
- Score: 0.8602553195689513
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In recent years, several new lexicase-based selection variants have emerged due to the success of standard lexicase selection in various application domains. For symbolic regression problems, variants that use an epsilon-threshold or batches of training cases, among others, have led to performance improvements. Lately, especially variants that combine lexicase selection and down-sampling strategies have received a lot of attention. This paper evaluates random as well as informed down-sampling in combination with the relevant lexicase-based selection methods on a wide range of symbolic regression problems. In contrast to most work, we not only compare the methods over a given evaluation budget, but also over a given time as time is usually limited in practice. We find that for a given evaluation budget, epsilon-lexicase selection in combination with random or informed down-sampling outperforms all other methods. Only for a rather long running time of 24h, the best performing method is tournament selection in combination with informed down-sampling. If the given running time is very short, lexicase variants using batches of training cases perform best.
Related papers
- DALex: Lexicase-like Selection via Diverse Aggregation [6.394522608656896]
We show that DALex (for Diversely Aggregated Lexicase) achieves significant speedups over lexicase selection and its relaxed variants.
Results on program synthesis, deep learning, symbolic regression, and learning systems demonstrate that DALex achieves significant speedups over lexicase selection and its relaxed variants.
arXiv Detail & Related papers (2024-01-23T01:20:15Z) - Towards Free Data Selection with General-Purpose Models [71.92151210413374]
A desirable data selection algorithm can efficiently choose the most informative samples to maximize the utility of limited annotation budgets.
Current approaches, represented by active learning methods, typically follow a cumbersome pipeline that iterates the time-consuming model training and batch data selection repeatedly.
FreeSel bypasses the heavy batch selection process, achieving a significant improvement in efficiency and being 530x faster than existing active learning methods.
arXiv Detail & Related papers (2023-09-29T15:50:14Z) - Untangling the Effects of Down-Sampling and Selection in Genetic Programming [40.05141985769286]
Genetic programming systems often use large training sets to evaluate the quality of candidate solutions for selection.
Recent studies have shown that both random and informed down-sampling can substantially improve problem-solving success.
arXiv Detail & Related papers (2023-04-14T12:21:19Z) - Down-Sampled Epsilon-Lexicase Selection for Real-World Symbolic
Regression Problems [1.8047694351309207]
Down-sampled epsilon-lexicase selection combines epsilon-lexicase selection with random subsampling to improve the performance in the domain of symbolic regression.
We find that the diversity is reduced by using down-sampled epsilon-lexicase selection compared to standard epsilon-lexicase selection.
With down-sampled epsilon-lexicase selection we observe an improvement of the solution quality of up to 85% in comparison to standard epsilon-lexicase selection.
arXiv Detail & Related papers (2023-02-08T19:36:26Z) - Bilevel Optimization for Feature Selection in the Data-Driven Newsvendor
Problem [8.281391209717105]
We study the feature-based news vendor problem, in which a decision-maker has access to historical data.
In this setting, we investigate feature selection, aiming to derive sparse, explainable models with improved out-of-sample performance.
We present a mixed integer linear program reformulation for the bilevel program, which can be solved to optimality with standard optimization solvers.
arXiv Detail & Related papers (2022-09-12T08:52:26Z) - Lexicase Selection at Scale [5.4968949435821735]
Lexicase selection is a semantic-aware parent selection method, which assesses individual test cases in a randomly-shuffled data stream.
One potential drawback of lexicase selection and its variants is that the selection procedure requires evaluating training cases in a single data stream.
We propose a novel method, fast lexicase selection, which incorporates lexicase selection and weighted shuffle with partial evaluation.
arXiv Detail & Related papers (2022-08-23T03:58:47Z) - UNICON: Combating Label Noise Through Uniform Selection and Contrastive
Learning [89.56465237941013]
We propose UNICON, a simple yet effective sample selection method which is robust to high label noise.
We obtain an 11.4% improvement over the current state-of-the-art on CIFAR100 dataset with a 90% noise rate.
arXiv Detail & Related papers (2022-03-28T07:36:36Z) - Adaptive Sampling for Heterogeneous Rank Aggregation from Noisy Pairwise
Comparisons [85.5955376526419]
In rank aggregation problems, users exhibit various accuracy levels when comparing pairs of items.
We propose an elimination-based active sampling strategy, which estimates the ranking of items via noisy pairwise comparisons.
We prove that our algorithm can return the true ranking of items with high probability.
arXiv Detail & Related papers (2021-10-08T13:51:55Z) - An Investigation of Replay-based Approaches for Continual Learning [79.0660895390689]
Continual learning (CL) is a major challenge of machine learning (ML) and describes the ability to learn several tasks sequentially without catastrophic forgetting (CF)
Several solution classes have been proposed, of which so-called replay-based approaches seem very promising due to their simplicity and robustness.
We empirically investigate replay-based approaches of continual learning and assess their potential for applications.
arXiv Detail & Related papers (2021-08-15T15:05:02Z) - Local policy search with Bayesian optimization [73.0364959221845]
Reinforcement learning aims to find an optimal policy by interaction with an environment.
Policy gradients for local search are often obtained from random perturbations.
We develop an algorithm utilizing a probabilistic model of the objective function and its gradient.
arXiv Detail & Related papers (2021-06-22T16:07:02Z) - Bloom Origami Assays: Practical Group Testing [90.2899558237778]
Group testing is a well-studied problem with several appealing solutions.
Recent biological studies impose practical constraints for COVID-19 that are incompatible with traditional methods.
We develop a new method combining Bloom filters with belief propagation to scale to larger values of n (more than 100) with good empirical results.
arXiv Detail & Related papers (2020-07-21T19:31:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.