Error Autocorrelation Objective Function for Improved System Modeling
- URL: http://arxiv.org/abs/2008.03582v2
- Date: Tue, 11 May 2021 15:34:38 GMT
- Title: Error Autocorrelation Objective Function for Improved System Modeling
- Authors: Anand Ramakrishnan, Warren B.Jackson and Kent Evans
- Abstract summary: We introduce a "whitening" cost function, the Ljung-Box statistic, which not only minimizes the error but also minimizes the correlations between errors.
The results show significant improvement in generalization for recurrent neural networks (RNNs) and image autoencoders (2d)
- Score: 1.2760453906939444
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Deep learning models are trained to minimize the error between the model's
output and the actual values. The typical cost function, the Mean Squared Error
(MSE), arises from maximizing the log-likelihood of additive independent,
identically distributed Gaussian noise. However, minimizing MSE fails to
minimize the residuals' cross-correlations, leading to over-fitting and poor
extrapolation of the model outside the training set (generalization). In this
paper, we introduce a "whitening" cost function, the Ljung-Box statistic, which
not only minimizes the error but also minimizes the correlations between
errors, ensuring that the fits enforce compatibility with an independent and
identically distributed (i.i.d) gaussian noise model. The results show
significant improvement in generalization for recurrent neural networks (RNNs)
(1d) and image autoencoders (2d). Specifically, we look at both temporal
correlations for system-id in simulated and actual mechanical systems. We also
look at spatial correlation in vision autoencoders to demonstrate that the
whitening objective functions lead to much better extrapolation--a property
very desirable for reliable control systems.
Related papers
- Model aggregation: minimizing empirical variance outperforms minimizing
empirical error [0.29008108937701327]
We propose a data-driven framework that aggregates predictions from diverse models into a single, more accurate output.
It is non-intrusive - treating models as black-box functions - model-agnostic, requires minimal assumptions, and can combine outputs from a wide range of models.
We show how it successfully integrates traditional solvers with machine learning models to improve both robustness and accuracy.
arXiv Detail & Related papers (2024-09-25T18:33:21Z) - A Pseudo-Semantic Loss for Autoregressive Models with Logical
Constraints [87.08677547257733]
Neuro-symbolic AI bridges the gap between purely symbolic and neural approaches to learning.
We show how to maximize the likelihood of a symbolic constraint w.r.t the neural network's output distribution.
We also evaluate our approach on Sudoku and shortest-path prediction cast as autoregressive generation.
arXiv Detail & Related papers (2023-12-06T20:58:07Z) - Diffusion-Model-Assisted Supervised Learning of Generative Models for
Density Estimation [10.793646707711442]
We present a framework for training generative models for density estimation.
We use the score-based diffusion model to generate labeled data.
Once the labeled data are generated, we can train a simple fully connected neural network to learn the generative model in the supervised manner.
arXiv Detail & Related papers (2023-10-22T23:56:19Z) - FeDXL: Provable Federated Learning for Deep X-Risk Optimization [105.17383135458897]
We tackle a novel federated learning (FL) problem for optimizing a family of X-risks, to which no existing algorithms are applicable.
The challenges for designing an FL algorithm for X-risks lie in the non-decomability of the objective over multiple machines and the interdependency between different machines.
arXiv Detail & Related papers (2022-10-26T00:23:36Z) - Integrating Random Effects in Deep Neural Networks [4.860671253873579]
We propose to use the mixed models framework to handle correlated data in deep neural networks.
By treating the effects underlying the correlation structure as random effects, mixed models are able to avoid overfitted parameter estimates.
Our approach which we call LMMNN is demonstrated to improve performance over natural competitors in various correlation scenarios.
arXiv Detail & Related papers (2022-06-07T14:02:24Z) - Automation for Interpretable Machine Learning Through a Comparison of
Loss Functions to Regularisers [0.0]
This paper explores the use of the Fit to Median Error measure in machine learning regression automation.
It improves interpretability by regularising learnt input-output relationships to the conditional median.
Networks optimised for their Fit to Median Error are shown to approximate the ground truth more consistently.
arXiv Detail & Related papers (2021-06-07T08:50:56Z) - Autocalibration and Tweedie-dominance for Insurance Pricing with Machine
Learning [0.0]
It is shown that minimizing deviance involves a trade-off between the integral of weighted differences of lower partial moments and the bias measured on a specific scale.
This new method to correct for bias adds extra local GLM step to the analysis.
The convex order appears to be the natural tool to compare competing models.
arXiv Detail & Related papers (2021-03-05T12:40:30Z) - On the Minimal Error of Empirical Risk Minimization [90.09093901700754]
We study the minimal error of the Empirical Risk Minimization (ERM) procedure in the task of regression.
Our sharp lower bounds shed light on the possibility (or impossibility) of adapting to simplicity of the model generating the data.
arXiv Detail & Related papers (2021-02-24T04:47:55Z) - Autoregressive Score Matching [113.4502004812927]
We propose autoregressive conditional score models (AR-CSM) where we parameterize the joint distribution in terms of the derivatives of univariable log-conditionals (scores)
For AR-CSM models, this divergence between data and model distributions can be computed and optimized efficiently, requiring no expensive sampling or adversarial training.
We show with extensive experimental results that it can be applied to density estimation on synthetic data, image generation, image denoising, and training latent variable models with implicit encoders.
arXiv Detail & Related papers (2020-10-24T07:01:24Z) - Good Classifiers are Abundant in the Interpolating Regime [64.72044662855612]
We develop a methodology to compute precisely the full distribution of test errors among interpolating classifiers.
We find that test errors tend to concentrate around a small typical value $varepsilon*$, which deviates substantially from the test error of worst-case interpolating model.
Our results show that the usual style of analysis in statistical learning theory may not be fine-grained enough to capture the good generalization performance observed in practice.
arXiv Detail & Related papers (2020-06-22T21:12:31Z) - Neural Control Variates [71.42768823631918]
We show that a set of neural networks can face the challenge of finding a good approximation of the integrand.
We derive a theoretically optimal, variance-minimizing loss function, and propose an alternative, composite loss for stable online training in practice.
Specifically, we show that the learned light-field approximation is of sufficient quality for high-order bounces, allowing us to omit the error correction and thereby dramatically reduce the noise at the cost of negligible visible bias.
arXiv Detail & Related papers (2020-06-02T11:17:55Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.