Interpretable Symbolic Regression for Data Science: Analysis of the 2022
Competition
- URL: http://arxiv.org/abs/2304.01117v3
- Date: Mon, 3 Jul 2023 11:31:36 GMT
- Title: Interpretable Symbolic Regression for Data Science: Analysis of the 2022
Competition
- Authors: F. O. de Franca, M. Virgolin, M. Kommenda, M. S. Majumder, M. Cranmer,
G. Espada, L. Ingelse, A. Fonseca, M. Landajuela, B. Petersen, R. Glatt, N.
Mundhenk, C. S. Lee, J. D. Hochhalter, D. L. Randall, P. Kamienny, H. Zhang,
G. Dick, A. Simon, B. Burlacu, Jaan Kasak, Meera Machado, Casper Wilstrup, W.
G. La Cava
- Abstract summary: Symbolic regression searches for analytic expressions that accurately describe studied phenomena.
There has been a recent surge of new proposals that utilize approaches such as enumeration algorithms, mixed linear integer programming, neural networks, and Bayesian optimization.
We present an in-depth analysis of the results obtained in this competition, discuss current challenges of symbolic regression algorithms and highlight possible improvements for future competitions.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Symbolic regression searches for analytic expressions that accurately
describe studied phenomena. The main attraction of this approach is that it
returns an interpretable model that can be insightful to users. Historically,
the majority of algorithms for symbolic regression have been based on
evolutionary algorithms. However, there has been a recent surge of new
proposals that instead utilize approaches such as enumeration algorithms, mixed
linear integer programming, neural networks, and Bayesian optimization. In
order to assess how well these new approaches behave on a set of common
challenges often faced in real-world data, we hosted a competition at the 2022
Genetic and Evolutionary Computation Conference consisting of different
synthetic and real-world datasets which were blind to entrants. For the
real-world track, we assessed interpretability in a realistic way by using a
domain expert to judge the trustworthiness of candidate models.We present an
in-depth analysis of the results obtained in this competition, discuss current
challenges of symbolic regression algorithms and highlight possible
improvements for future competitions.
Related papers
- Histogram Approaches for Imbalanced Data Streams Regression [1.8385275253826225]
Imbalanced domains pose a significant challenge in real-world predictive analytics, particularly in the context of regression.
This study introduces histogram-based sampling strategies to overcome this constraint.
Comprehensive experiments on synthetic and real-world benchmarks demonstrate that HistUS and HistOS substantially improve rare-case prediction accuracy.
arXiv Detail & Related papers (2025-01-29T11:03:02Z) - Comparative study of regression vs pairwise models for surrogate-based heuristic optimisation [1.2535250082638645]
This paper addresses the formulation of surrogate problems as both regression models that approximate fitness (surface surrogate models) and a novel way to connect classification models (pairwise surrogate models)
The performance of the overall search, when using online machine learning-based surrogate models, depends not only on the accuracy of the predictive model but also on the kind of bias towards positive or negative cases.
arXiv Detail & Related papers (2024-10-04T13:19:06Z) - Discovering physical laws with parallel symbolic enumeration [67.36739393470869]
We introduce parallel symbolic enumeration (PSE) to efficiently distill generic mathematical expressions from limited data.<n>Experiments show that PSE achieves higher accuracy and faster computation compared to the state-of-the-art baseline algorithms.<n> PSE represents an advance in accurate and efficient data-driven discovery of symbolic, interpretable models.
arXiv Detail & Related papers (2024-07-05T10:41:15Z) - A Comparison of Recent Algorithms for Symbolic Regression to Genetic Programming [0.0]
Symbolic regression aims to model and map data in a way that can be understood by scientists.
Recent advancements, have attempted to bridge the gap between these two fields.
arXiv Detail & Related papers (2024-06-05T19:01:43Z) - Out of the Ordinary: Spectrally Adapting Regression for Covariate Shift [12.770658031721435]
We propose a method for adapting the weights of the last layer of a pre-trained neural regression model to perform better on input data originating from a different distribution.
We demonstrate how this lightweight spectral adaptation procedure can improve out-of-distribution performance for synthetic and real-world datasets.
arXiv Detail & Related papers (2023-12-29T04:15:58Z) - Tackling Diverse Minorities in Imbalanced Classification [80.78227787608714]
Imbalanced datasets are commonly observed in various real-world applications, presenting significant challenges in training classifiers.
We propose generating synthetic samples iteratively by mixing data samples from both minority and majority classes.
We demonstrate the effectiveness of our proposed framework through extensive experiments conducted on seven publicly available benchmark datasets.
arXiv Detail & Related papers (2023-08-28T18:48:34Z) - Tackling Computational Heterogeneity in FL: A Few Theoretical Insights [68.8204255655161]
We introduce and analyse a novel aggregation framework that allows for formalizing and tackling computational heterogeneous data.
Proposed aggregation algorithms are extensively analyzed from a theoretical, and an experimental prospective.
arXiv Detail & Related papers (2023-07-12T16:28:21Z) - Evaluating Graph Neural Networks for Link Prediction: Current Pitfalls
and New Benchmarking [66.83273589348758]
Link prediction attempts to predict whether an unseen edge exists based on only a portion of edges of a graph.
A flurry of methods have been introduced in recent years that attempt to make use of graph neural networks (GNNs) for this task.
New and diverse datasets have also been created to better evaluate the effectiveness of these new models.
arXiv Detail & Related papers (2023-06-18T01:58:59Z) - DRFLM: Distributionally Robust Federated Learning with Inter-client
Noise via Local Mixup [58.894901088797376]
federated learning has emerged as a promising approach for training a global model using data from multiple organizations without leaking their raw data.
We propose a general framework to solve the above two challenges simultaneously.
We provide comprehensive theoretical analysis including robustness analysis, convergence analysis, and generalization ability.
arXiv Detail & Related papers (2022-04-16T08:08:29Z) - Contemporary Symbolic Regression Methods and their Relative Performance [5.285811942108162]
We assess 14 symbolic regression methods and 7 machine learning methods on a set of 252 diverse regression problems.
For the real-world datasets, we benchmark the ability of each method to learn models with low error and low complexity.
For the synthetic problems, we assess each method's ability to find exact solutions in the presence of varying levels of noise.
arXiv Detail & Related papers (2021-07-29T22:12:59Z) - Implementing Fair Regression In The Real World [3.723553383515688]
We investigate the impact of such implementation of fair regression on the individual.
We propose a set of post-processing algorithms to improve the utility of the existing fair regression approaches.
arXiv Detail & Related papers (2021-04-09T13:31:16Z) - Representative & Fair Synthetic Data [68.8204255655161]
We present a framework to incorporate fairness constraints into the self-supervised learning process.
We generate a representative as well as fair version of the UCI Adult census data set.
We consider representative & fair synthetic data a promising future building block to teach algorithms not on historic worlds, but rather on the worlds that we strive to live in.
arXiv Detail & Related papers (2021-04-07T09:19:46Z) - Two-step penalised logistic regression for multi-omic data with an
application to cardiometabolic syndrome [62.997667081978825]
We implement a two-step approach to multi-omic logistic regression in which variable selection is performed on each layer separately.
Our approach should be preferred if the goal is to select as many relevant predictors as possible.
Our proposed approach allows us to identify features that characterise cardiometabolic syndrome at the molecular level.
arXiv Detail & Related papers (2020-08-01T10:36:27Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.