Symbolic Regression by Exhaustive Search: Reducing the Search Space
Using Syntactical Constraints and Efficient Semantic Structure Deduplication
- URL: http://arxiv.org/abs/2109.13895v1
- Date: Tue, 28 Sep 2021 17:47:51 GMT
- Title: Symbolic Regression by Exhaustive Search: Reducing the Search Space
Using Syntactical Constraints and Efficient Semantic Structure Deduplication
- Authors: Lukas Kammerer, Gabriel Kronberger, Bogdan Burlacu, Stephan M.
Winkler, Michael Kommenda, Michael Affenzeller
- Abstract summary: Symbolic regression is a powerful system identification technique in industrial scenarios where no prior knowledge on model structure is available.
In this chapter we introduce a deterministic symbolic regression algorithm specifically designed to address these issues.
A finite enumeration of all possible models is guaranteed by structural restrictions as well as a caching mechanism for detecting semantically equivalent solutions.
- Score: 2.055204980188575
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Symbolic regression is a powerful system identification technique in
industrial scenarios where no prior knowledge on model structure is available.
Such scenarios often require specific model properties such as
interpretability, robustness, trustworthiness and plausibility, that are not
easily achievable using standard approaches like genetic programming for
symbolic regression. In this chapter we introduce a deterministic symbolic
regression algorithm specifically designed to address these issues. The
algorithm uses a context-free grammar to produce models that are parameterized
by a non-linear least squares local optimization procedure. A finite
enumeration of all possible models is guaranteed by structural restrictions as
well as a caching mechanism for detecting semantically equivalent solutions.
Enumeration order is established via heuristics designed to improve search
efficiency. Empirical tests on a comprehensive benchmark suite show that our
approach is competitive with genetic programming in many noiseless problems
while maintaining desirable properties such as simple, reliable models and
reproducibility.
Related papers
- Comparative study of regression vs pairwise models for surrogate-based heuristic optimisation [1.2535250082638645]
This paper addresses the formulation of surrogate problems as both regression models that approximate fitness (surface surrogate models) and a novel way to connect classification models (pairwise surrogate models)
The performance of the overall search, when using online machine learning-based surrogate models, depends not only on the accuracy of the predictive model but also on the kind of bias towards positive or negative cases.
arXiv Detail & Related papers (2024-10-04T13:19:06Z) - Rigorous Probabilistic Guarantees for Robust Counterfactual Explanations [80.86128012438834]
We show for the first time that computing the robustness of counterfactuals with respect to plausible model shifts is NP-complete.
We propose a novel probabilistic approach which is able to provide tight estimates of robustness with strong guarantees.
arXiv Detail & Related papers (2024-07-10T09:13:11Z) - The Inefficiency of Genetic Programming for Symbolic Regression -- Extended Version [0.0]
We analyse the search behaviour of genetic programming for symbolic regression in practically relevant but limited settings.
This enables us to quantify the success probability of finding the best possible expressions.
We compare the search efficiency of genetic programming to random search in the space of semantically unique expressions.
arXiv Detail & Related papers (2024-04-26T09:49:32Z) - HyperImpute: Generalized Iterative Imputation with Automatic Model
Selection [77.86861638371926]
We propose a generalized iterative imputation framework for adaptively and automatically configuring column-wise models.
We provide a concrete implementation with out-of-the-box learners, simulators, and interfaces.
arXiv Detail & Related papers (2022-06-15T19:10:35Z) - Fast Feature Selection with Fairness Constraints [49.142308856826396]
We study the fundamental problem of selecting optimal features for model construction.
This problem is computationally challenging on large datasets, even with the use of greedy algorithm variants.
We extend the adaptive query model, recently proposed for the greedy forward selection for submodular functions, to the faster paradigm of Orthogonal Matching Pursuit for non-submodular functions.
The proposed algorithm achieves exponentially fast parallel run time in the adaptive query model, scaling much better than prior work.
arXiv Detail & Related papers (2022-02-28T12:26:47Z) - Counterfactual Explanations for Arbitrary Regression Models [8.633492031855655]
We present a new method for counterfactual explanations (CFEs) based on Bayesian optimisation.
Our method is a globally convergent search algorithm with support for arbitrary regression models and constraints like feature sparsity and actionable recourse.
arXiv Detail & Related papers (2021-06-29T09:53:53Z) - Fractal Structure and Generalization Properties of Stochastic
Optimization Algorithms [71.62575565990502]
We prove that the generalization error of an optimization algorithm can be bounded on the complexity' of the fractal structure that underlies its generalization measure.
We further specialize our results to specific problems (e.g., linear/logistic regression, one hidden/layered neural networks) and algorithms.
arXiv Detail & Related papers (2021-06-09T08:05:36Z) - Calibrating Over-Parametrized Simulation Models: A Framework via
Eligibility Set [3.862247454265944]
We develop a framework to develop calibration schemes that satisfy rigorous frequentist statistical guarantees.
We demonstrate our methodology on several numerical examples, including an application to calibration of a limit order book market simulator.
arXiv Detail & Related papers (2021-05-27T00:59:29Z) - Goal-directed Generation of Discrete Structures with Conditional
Generative Models [85.51463588099556]
We introduce a novel approach to directly optimize a reinforcement learning objective, maximizing an expected reward.
We test our methodology on two tasks: generating molecules with user-defined properties and identifying short python expressions which evaluate to a given target value.
arXiv Detail & Related papers (2020-10-05T20:03:13Z) - Instability, Computational Efficiency and Statistical Accuracy [101.32305022521024]
We develop a framework that yields statistical accuracy based on interplay between the deterministic convergence rate of the algorithm at the population level, and its degree of (instability) when applied to an empirical object based on $n$ samples.
We provide applications of our general results to several concrete classes of models, including Gaussian mixture estimation, non-linear regression models, and informative non-response models.
arXiv Detail & Related papers (2020-05-22T22:30:52Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.