Active Learning in Genetic Programming: Guiding Efficient Data
Collection for Symbolic Regression
- URL: http://arxiv.org/abs/2308.00672v1
- Date: Mon, 31 Jul 2023 14:37:20 GMT
- Title: Active Learning in Genetic Programming: Guiding Efficient Data
Collection for Symbolic Regression
- Authors: Nathan Haut, Wolfgang Banzhaf, and Bill Punch
- Abstract summary: This paper examines various methods of computing uncertainty and diversity for active learning in genetic programming.
We found that the model population in genetic programming can be exploited to select informative training data points by using a model ensemble combined with an uncertainty metric.
- Score: 2.4633342801625213
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: This paper examines various methods of computing uncertainty and diversity
for active learning in genetic programming. We found that the model population
in genetic programming can be exploited to select informative training data
points by using a model ensemble combined with an uncertainty metric. We
explored several uncertainty metrics and found that differential entropy
performed the best. We also compared two data diversity metrics and found that
correlation as a diversity metric performs better than minimum Euclidean
distance, although there are some drawbacks that prevent correlation from being
used on all problems. Finally, we combined uncertainty and diversity using a
Pareto optimization approach to allow both to be considered in a balanced way
to guide the selection of informative and unique data points for training.
Related papers
- Collaborative Heterogeneous Causal Inference Beyond Meta-analysis [68.4474531911361]
We propose a collaborative inverse propensity score estimator for causal inference with heterogeneous data.
Our method shows significant improvements over the methods based on meta-analysis when heterogeneity increases.
arXiv Detail & Related papers (2024-04-24T09:04:36Z) - Globally-Optimal Greedy Experiment Selection for Active Sequential
Estimation [1.1530723302736279]
We study the problem of active sequential estimation, which involves adaptively selecting experiments for sequentially collected data.
The goal is to design experiment selection rules for more accurate model estimation.
We propose a class of greedy experiment selection methods and provide statistical analysis for the maximum likelihood.
arXiv Detail & Related papers (2024-02-13T17:09:29Z) - Deep Negative Correlation Classification [82.45045814842595]
Existing deep ensemble methods naively train many different models and then aggregate their predictions.
We propose deep negative correlation classification (DNCC)
DNCC yields a deep classification ensemble where the individual estimator is both accurate and negatively correlated.
arXiv Detail & Related papers (2022-12-14T07:35:20Z) - Selective Inference for Sparse Multitask Regression with Applications in
Neuroimaging [2.611153304251067]
We propose a framework for selective inference to address a common multi-task problem in neuroimaging.
Our framework offers a new conditional procedure for inference, based on a refinement of the selection event that yields a tractable selection-adjusted likelihood.
We demonstrate through simulations that multi-task learning with selective inference can more accurately recover true signals than single-task methods.
arXiv Detail & Related papers (2022-05-27T20:21:20Z) - Fair Feature Subset Selection using Multiobjective Genetic Algorithm [0.0]
We present a feature subset selection approach that improves both fairness and accuracy objectives.
We use statistical disparity as a fairness metric and F1-Score as a metric for model performance.
Our experiments on the most commonly used fairness benchmark datasets show that using the evolutionary algorithm we can effectively explore the trade-off between fairness and accuracy.
arXiv Detail & Related papers (2022-04-30T22:51:19Z) - Data-heterogeneity-aware Mixing for Decentralized Learning [63.83913592085953]
We characterize the dependence of convergence on the relationship between the mixing weights of the graph and the data heterogeneity across nodes.
We propose a metric that quantifies the ability of a graph to mix the current gradients.
Motivated by our analysis, we propose an approach that periodically and efficiently optimize the metric.
arXiv Detail & Related papers (2022-04-13T15:54:35Z) - Non-IID data and Continual Learning processes in Federated Learning: A
long road ahead [58.720142291102135]
Federated Learning is a novel framework that allows multiple devices or institutions to train a machine learning model collaboratively while preserving their data private.
In this work, we formally classify data statistical heterogeneity and review the most remarkable learning strategies that are able to face it.
At the same time, we introduce approaches from other machine learning frameworks, such as Continual Learning, that also deal with data heterogeneity and could be easily adapted to the Federated Learning settings.
arXiv Detail & Related papers (2021-11-26T09:57:11Z) - Variable selection with missing data in both covariates and outcomes:
Imputation and machine learning [1.0333430439241666]
The missing data issue is ubiquitous in health studies.
Machine learning methods weaken parametric assumptions.
XGBoost and BART have the overall best performance across various settings.
arXiv Detail & Related papers (2021-04-06T20:18:29Z) - Latent Network Estimation and Variable Selection for Compositional Data
via Variational EM [0.0]
We develop a novel method to simultaneously estimate network interactions and associations.
We show the practical utility of our model via an application to microbiome data.
arXiv Detail & Related papers (2020-10-25T21:52:39Z) - Learning while Respecting Privacy and Robustness to Distributional
Uncertainties and Adversarial Data [66.78671826743884]
The distributionally robust optimization framework is considered for training a parametric model.
The objective is to endow the trained model with robustness against adversarially manipulated input data.
Proposed algorithms offer robustness with little overhead.
arXiv Detail & Related papers (2020-07-07T18:25:25Z) - Towards Model-Agnostic Post-Hoc Adjustment for Balancing Ranking
Fairness and Algorithm Utility [54.179859639868646]
Bipartite ranking aims to learn a scoring function that ranks positive individuals higher than negative ones from labeled data.
There have been rising concerns on whether the learned scoring function can cause systematic disparity across different protected groups.
We propose a model post-processing framework for balancing them in the bipartite ranking scenario.
arXiv Detail & Related papers (2020-06-15T10:08:39Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.