Related papers: Data-Informed Model Complexity Metric for Optimizing Symbolic Regression Models

Data-Informed Model Complexity Metric for Optimizing Symbolic Regression Models

URL: http://arxiv.org/abs/2501.17372v1
Date: Wed, 29 Jan 2025 01:53:22 GMT
Title: Data-Informed Model Complexity Metric for Optimizing Symbolic Regression Models
Authors: Nathan Haut, Zenas Huang, Adam Alessio,
Abstract summary: We introduce a pragmatic method to estimate model complexity using Hessian rank for post-processing selection.<n>This method aligns model selection with input data complexity, calculated using intrinsic dimensionality (ID) estimators.
Score: 0.0
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Choosing models from a well-fitted evolved population that generalizes beyond training data is difficult. We introduce a pragmatic method to estimate model complexity using Hessian rank for post-processing selection. Complexity is approximated by averaging the model output Hessian rank across a few points (N=3), offering efficient and accurate rank estimates. This method aligns model selection with input data complexity, calculated using intrinsic dimensionality (ID) estimators. Using the StackGP system, we develop symbolic regression models for the Penn Machine Learning Benchmark and employ twelve scikit-dimension library methods to estimate ID, aligning model expressiveness with dataset ID. Our data-informed complexity metric finds the ideal complexity window, balancing model expressiveness and accuracy, enhancing generalizability without bias common in methods reliant on user-defined parameters, such as parsimony pressure in weight selection.

Related papers

Going from a Representative Agent to Counterfactuals in Combinatorial Choice [2.9172603864294033]
We study decision-making problems where data comprises points from a collection of binary polytopes.<n>We propose a nonparametric approach for counterfactual inference in this setting based on a representative agent model.
arXiv Detail & Related papers (2025-05-29T15:24:23Z)
NAN: A Training-Free Solution to Coefficient Estimation in Model Merging [61.36020737229637]
We show that the optimal merging weights should scale with the amount of task-specific information encoded in each model.<n>We propose NAN, a simple yet effective method that estimates model merging coefficients via the inverse of parameter norm.<n>NAN is training-free, plug-and-play, and applicable to a wide range of merging strategies.
arXiv Detail & Related papers (2025-05-22T02:46:08Z)
An Iterative Bayesian Approach for System Identification based on Linear Gaussian Models [86.05414211113627]
We tackle the problem of system identification, where we select inputs, observe the corresponding outputs from the true system, and optimize the parameters of our model to best fit the data. We propose a flexible and computationally tractable methodology that is compatible with any system and parametric family of models.
arXiv Detail & Related papers (2025-01-28T01:57:51Z)
Computational-Statistical Gaps in Gaussian Single-Index Models [77.1473134227844]
Single-Index Models are high-dimensional regression problems with planted structure. We show that computationally efficient algorithms, both within the Statistical Query (SQ) and the Low-Degree Polynomial (LDP) framework, necessarily require $Omega(dkstar/2)$ samples.
arXiv Detail & Related papers (2024-03-08T18:50:19Z)
Fusion of Gaussian Processes Predictions with Monte Carlo Sampling [61.31380086717422]
In science and engineering, we often work with models designed for accurate prediction of variables of interest. Recognizing that these models are approximations of reality, it becomes desirable to apply multiple models to the same data and integrate their outcomes.
arXiv Detail & Related papers (2024-03-03T04:21:21Z)
Sample Complexity Characterization for Linear Contextual MDPs [67.79455646673762]
Contextual decision processes (CMDPs) describe a class of reinforcement learning problems in which the transition kernels and reward functions can change over time with different MDPs indexed by a context variable. CMDPs serve as an important framework to model many real-world applications with time-varying environments. We study CMDPs under two linear function approximation models: Model I with context-varying representations and common linear weights for all contexts; and Model II with common representations for all contexts and context-varying linear weights.
arXiv Detail & Related papers (2024-02-05T03:25:04Z)
A model-free feature selection technique of feature screening and random forest based recursive feature elimination [0.0]
We propose a model-free feature selection method for ultra-high dimensional data with mass features. We show that the proposed method is selection consistent and $L$ consistent under weak regularity conditions.
arXiv Detail & Related papers (2023-02-15T03:39:16Z)
Optimally Weighted Ensembles of Regression Models: Exact Weight Optimization and Applications [0.0]
We show that combining different regression models can yield better results than selecting a single ('best') regression model. We outline an efficient method that obtains optimally weighted linear combination from a heterogeneous set of regression models.
arXiv Detail & Related papers (2022-06-22T09:11:14Z)
HyperImpute: Generalized Iterative Imputation with Automatic Model Selection [77.86861638371926]
We propose a generalized iterative imputation framework for adaptively and automatically configuring column-wise models. We provide a concrete implementation with out-of-the-box learners, simulators, and interfaces.
arXiv Detail & Related papers (2022-06-15T19:10:35Z)
Machine learning with incomplete datasets using multi-objective optimization models [1.933681537640272]
We propose an online approach to handle missing values while a classification model is learnt. We develop a multi-objective optimization model with two objective functions for imputation and model selection. We use an evolutionary algorithm based on NSGA II to find the optimal solutions.
arXiv Detail & Related papers (2020-12-04T03:44:33Z)
Applying Evolutionary Metaheuristics for Parameter Estimation of Individual-Based Models [0.0]
We introduce EvoPER, an R package for the simplifying the parameter estimation using evolutionary methods. In this work, we introduce EvoPER, an R package for the simplifying the parameter estimation using evolutionary methods.
arXiv Detail & Related papers (2020-05-24T07:48:27Z)
Semi-analytic approximate stability selection for correlated data in generalized linear models [3.42658286826597]
We propose a novel approximate inference algorithm that can conduct Stability Selection without the repeated fitting. The algorithm is based on the replica method of statistical mechanics and vector approximate message passing of information theory. Numerical experiments indicate that the algorithm exhibits fast convergence and high approximation accuracy for both synthetic and real-world data.
arXiv Detail & Related papers (2020-03-19T10:43:12Z)
Learning Gaussian Graphical Models via Multiplicative Weights [54.252053139374205]
We adapt an algorithm of Klivans and Meka based on the method of multiplicative weight updates. The algorithm enjoys a sample complexity bound that is qualitatively similar to others in the literature. It has a low runtime $O(mp2)$ in the case of $m$ samples and $p$ nodes, and can trivially be implemented in an online manner.
arXiv Detail & Related papers (2020-02-20T10:50:58Z)

This list is automatically generated from the titles and abstracts of the papers in this site.