Related papers: Symbolic Regression by Exhaustive Search: Reducing the Search Space Using Syntactical Constraints and Efficient Semantic Structure Deduplication

Symbolic Regression by Exhaustive Search: Reducing the Search Space Using Syntactical Constraints and Efficient Semantic Structure Deduplication

URL: http://arxiv.org/abs/2109.13895v1
Date: Tue, 28 Sep 2021 17:47:51 GMT
Title: Symbolic Regression by Exhaustive Search: Reducing the Search Space Using Syntactical Constraints and Efficient Semantic Structure Deduplication
Authors: Lukas Kammerer, Gabriel Kronberger, Bogdan Burlacu, Stephan M. Winkler, Michael Kommenda, Michael Affenzeller
Abstract summary: Symbolic regression is a powerful system identification technique in industrial scenarios where no prior knowledge on model structure is available. In this chapter we introduce a deterministic symbolic regression algorithm specifically designed to address these issues. A finite enumeration of all possible models is guaranteed by structural restrictions as well as a caching mechanism for detecting semantically equivalent solutions.
Score: 2.055204980188575
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Symbolic regression is a powerful system identification technique in industrial scenarios where no prior knowledge on model structure is available. Such scenarios often require specific model properties such as interpretability, robustness, trustworthiness and plausibility, that are not easily achievable using standard approaches like genetic programming for symbolic regression. In this chapter we introduce a deterministic symbolic regression algorithm specifically designed to address these issues. The algorithm uses a context-free grammar to produce models that are parameterized by a non-linear least squares local optimization procedure. A finite enumeration of all possible models is guaranteed by structural restrictions as well as a caching mechanism for detecting semantically equivalent solutions. Enumeration order is established via heuristics designed to improve search efficiency. Empirical tests on a comprehensive benchmark suite show that our approach is competitive with genetic programming in many noiseless problems while maintaining desirable properties such as simple, reliable models and reproducibility.

Related papers

Comparative study of regression vs pairwise models for surrogate-based heuristic optimisation [1.2535250082638645]
This paper addresses the formulation of surrogate problems as both regression models that approximate fitness (surface surrogate models) and a novel way to connect classification models (pairwise surrogate models) The performance of the overall search, when using online machine learning-based surrogate models, depends not only on the accuracy of the predictive model but also on the kind of bias towards positive or negative cases.
arXiv Detail & Related papers (2024-10-04T13:19:06Z)
Rigorous Probabilistic Guarantees for Robust Counterfactual Explanations [80.86128012438834]
We show for the first time that computing the robustness of counterfactuals with respect to plausible model shifts is NP-complete. We propose a novel probabilistic approach which is able to provide tight estimates of robustness with strong guarantees.
arXiv Detail & Related papers (2024-07-10T09:13:11Z)
The Inefficiency of Genetic Programming for Symbolic Regression -- Extended Version [0.0]
We analyse the search behaviour of genetic programming for symbolic regression in practically relevant but limited settings. This enables us to quantify the success probability of finding the best possible expressions. We compare the search efficiency of genetic programming to random search in the space of semantically unique expressions.
arXiv Detail & Related papers (2024-04-26T09:49:32Z)
HyperImpute: Generalized Iterative Imputation with Automatic Model Selection [77.86861638371926]
We propose a generalized iterative imputation framework for adaptively and automatically configuring column-wise models. We provide a concrete implementation with out-of-the-box learners, simulators, and interfaces.
arXiv Detail & Related papers (2022-06-15T19:10:35Z)
Fast Feature Selection with Fairness Constraints [49.142308856826396]
We study the fundamental problem of selecting optimal features for model construction. This problem is computationally challenging on large datasets, even with the use of greedy algorithm variants. We extend the adaptive query model, recently proposed for the greedy forward selection for submodular functions, to the faster paradigm of Orthogonal Matching Pursuit for non-submodular functions. The proposed algorithm achieves exponentially fast parallel run time in the adaptive query model, scaling much better than prior work.
arXiv Detail & Related papers (2022-02-28T12:26:47Z)
Counterfactual Explanations for Arbitrary Regression Models [8.633492031855655]
We present a new method for counterfactual explanations (CFEs) based on Bayesian optimisation. Our method is a globally convergent search algorithm with support for arbitrary regression models and constraints like feature sparsity and actionable recourse.
arXiv Detail & Related papers (2021-06-29T09:53:53Z)
Fractal Structure and Generalization Properties of Stochastic Optimization Algorithms [71.62575565990502]
We prove that the generalization error of an optimization algorithm can be bounded on the complexity' of the fractal structure that underlies its generalization measure. We further specialize our results to specific problems (e.g., linear/logistic regression, one hidden/layered neural networks) and algorithms.
arXiv Detail & Related papers (2021-06-09T08:05:36Z)
Calibrating Over-Parametrized Simulation Models: A Framework via Eligibility Set [3.862247454265944]
We develop a framework to develop calibration schemes that satisfy rigorous frequentist statistical guarantees. We demonstrate our methodology on several numerical examples, including an application to calibration of a limit order book market simulator.
arXiv Detail & Related papers (2021-05-27T00:59:29Z)
Goal-directed Generation of Discrete Structures with Conditional Generative Models [85.51463588099556]
We introduce a novel approach to directly optimize a reinforcement learning objective, maximizing an expected reward. We test our methodology on two tasks: generating molecules with user-defined properties and identifying short python expressions which evaluate to a given target value.
arXiv Detail & Related papers (2020-10-05T20:03:13Z)
Instability, Computational Efficiency and Statistical Accuracy [101.32305022521024]
We develop a framework that yields statistical accuracy based on interplay between the deterministic convergence rate of the algorithm at the population level, and its degree of (instability) when applied to an empirical object based on $n$ samples. We provide applications of our general results to several concrete classes of models, including Gaussian mixture estimation, non-linear regression models, and informative non-response models.
arXiv Detail & Related papers (2020-05-22T22:30:52Z)

This list is automatically generated from the titles and abstracts of the papers in this site.