Symbolic Regression Driven by Training Data and Prior Knowledge
- URL: http://arxiv.org/abs/2004.11947v1
- Date: Fri, 24 Apr 2020 19:15:06 GMT
- Title: Symbolic Regression Driven by Training Data and Prior Knowledge
- Authors: J. Kubal\'ik, E. Derner, R. Babu\v{s}ka
- Abstract summary: In symbolic regression, the search for analytic models is driven purely by the prediction error observed on the training data samples.
We propose a multi-objective symbolic regression approach that is driven by both the training data and the prior knowledge of the properties the desired model should manifest.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In symbolic regression, the search for analytic models is typically driven
purely by the prediction error observed on the training data samples. However,
when the data samples do not sufficiently cover the input space, the prediction
error does not provide sufficient guidance toward desired models. Standard
symbolic regression techniques then yield models that are partially incorrect,
for instance, in terms of their steady-state characteristics or local behavior.
If these properties were considered already during the search process, more
accurate and relevant models could be produced. We propose a multi-objective
symbolic regression approach that is driven by both the training data and the
prior knowledge of the properties the desired model should manifest. The
properties given in the form of formal constraints are internally represented
by a set of discrete data samples on which candidate models are exactly
checked. The proposed approach was experimentally evaluated on three test
problems with results clearly demonstrating its capability to evolve realistic
models that fit the training data well while complying with the prior knowledge
of the desired model characteristics at the same time. It outperforms standard
symbolic regression by several orders of magnitude in terms of the mean squared
deviation from a reference model.
Related papers
- On conditional diffusion models for PDE simulations [53.01911265639582]
We study score-based diffusion models for forecasting and assimilation of sparse observations.
We propose an autoregressive sampling approach that significantly improves performance in forecasting.
We also propose a new training strategy for conditional score-based models that achieves stable performance over a range of history lengths.
arXiv Detail & Related papers (2024-10-21T18:31:04Z) - Learning Augmentation Policies from A Model Zoo for Time Series Forecasting [58.66211334969299]
We introduce AutoTSAug, a learnable data augmentation method based on reinforcement learning.
By augmenting the marginal samples with a learnable policy, AutoTSAug substantially improves forecasting performance.
arXiv Detail & Related papers (2024-09-10T07:34:19Z) - Adaptive Optimization for Prediction with Missing Data [6.800113478497425]
We show that some adaptive linear regression models are equivalent to learning an imputation rule and a downstream linear regression model simultaneously.
In settings where data is strongly not missing at random, our methods achieve a 2-10% improvement in out-of-sample accuracy.
arXiv Detail & Related papers (2024-02-02T16:35:51Z) - A performance characteristic curve for model evaluation: the application
in information diffusion prediction [3.8711489380602804]
We propose a metric based on information entropy to quantify the randomness in diffusion data, then identify a scaling pattern between the randomness and the prediction accuracy of the model.
Data points in the patterns by different sequence lengths, system sizes, and randomness all collapse into a single curve, capturing a model's inherent capability of making correct predictions.
The validity of the curve is tested by three prediction models in the same family, reaching conclusions in line with existing studies.
arXiv Detail & Related papers (2023-09-18T07:32:57Z) - Analysis of Interpolating Regression Models and the Double Descent
Phenomenon [3.883460584034765]
It is commonly assumed that models which interpolate noisy training data are poor to generalize.
The best models obtained are overparametrized and the testing error exhibits the double descent behavior as the model order increases.
We derive a result based on the behavior of the smallest singular value of the regression matrix that explains the peak location and the double descent shape of the testing error as a function of model order.
arXiv Detail & Related papers (2023-04-17T09:44:33Z) - Certifying Data-Bias Robustness in Linear Regression [12.00314910031517]
We present a technique for certifying whether linear regression models are pointwise-robust to label bias in a training dataset.
We show how to solve this problem exactly for individual test points, and provide an approximate but more scalable method.
We also unearth gaps in bias-robustness, such as high levels of non-robustness for certain bias assumptions on some datasets.
arXiv Detail & Related papers (2022-06-07T20:47:07Z) - Pathologies of Pre-trained Language Models in Few-shot Fine-tuning [50.3686606679048]
We show that pre-trained language models with few examples show strong prediction bias across labels.
Although few-shot fine-tuning can mitigate the prediction bias, our analysis shows models gain performance improvement by capturing non-task-related features.
These observations alert that pursuing model performance with fewer examples may incur pathological prediction behavior.
arXiv Detail & Related papers (2022-04-17T15:55:18Z) - Conformal prediction for the design problem [72.14982816083297]
In many real-world deployments of machine learning, we use a prediction algorithm to choose what data to test next.
In such settings, there is a distinct type of distribution shift between the training and test data.
We introduce a method to quantify predictive uncertainty in such settings.
arXiv Detail & Related papers (2022-02-08T02:59:12Z) - Sampling To Improve Predictions For Underrepresented Observations In
Imbalanced Data [0.0]
Data imbalance negatively impacts the predictive performance of models on underrepresented observations.
We propose sampling to adjust for this imbalance with the goal of improving the performance of models trained on historical production data.
We apply our methods on a large biopharmaceutical manufacturing data set from an advanced simulation of penicillin production.
arXiv Detail & Related papers (2021-11-17T12:16:54Z) - Probabilistic Modeling for Human Mesh Recovery [73.11532990173441]
This paper focuses on the problem of 3D human reconstruction from 2D evidence.
We recast the problem as learning a mapping from the input to a distribution of plausible 3D poses.
arXiv Detail & Related papers (2021-08-26T17:55:11Z) - Goal-directed Generation of Discrete Structures with Conditional
Generative Models [85.51463588099556]
We introduce a novel approach to directly optimize a reinforcement learning objective, maximizing an expected reward.
We test our methodology on two tasks: generating molecules with user-defined properties and identifying short python expressions which evaluate to a given target value.
arXiv Detail & Related papers (2020-10-05T20:03:13Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.