Learning CHARME models with neural networks
- URL: http://arxiv.org/abs/2002.03237v2
- Date: Tue, 17 Nov 2020 17:54:08 GMT
- Title: Learning CHARME models with neural networks
- Authors: Jos\'e G. G\'omez Garc\'ia, Jalal Fadili, Christophe Chesneau
- Abstract summary: We consider a model called CHARME (Conditional Heteroscedastic Autoregressive Mixture of Experts)
As an application, we develop a learning theory for the NN-based autoregressive functions of the model.
- Score: 1.5362025549031046
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In this paper, we consider a model called CHARME (Conditional Heteroscedastic
Autoregressive Mixture of Experts), a class of generalized mixture of nonlinear
nonparametric AR-ARCH time series. Under certain Lipschitz-type conditions on
the autoregressive and volatility functions, we prove that this model is
stationary, ergodic and $\tau$-weakly dependent. These conditions are much
weaker than those presented in the literature that treats this model. Moreover,
this result forms the theoretical basis for deriving an asymptotic theory of
the underlying (non)parametric estimation, which we present for this model. As
an application, from the universal approximation property of neural networks
(NN), we develop a learning theory for the NN-based autoregressive functions of
the model, where the strong consistency and asymptotic normality of the
considered estimator of the NN weights and biases are guaranteed under weak
conditions.
Related papers
- Latent Space Energy-based Neural ODEs [73.01344439786524]
This paper introduces a novel family of deep dynamical models designed to represent continuous-time sequence data.
We train the model using maximum likelihood estimation with Markov chain Monte Carlo.
Experiments on oscillating systems, videos and real-world state sequences (MuJoCo) illustrate that ODEs with the learnable energy-based prior outperform existing counterparts.
arXiv Detail & Related papers (2024-09-05T18:14:22Z) - A PAC-Bayesian Perspective on the Interpolating Information Criterion [54.548058449535155]
We show how a PAC-Bayes bound is obtained for a general class of models, characterizing factors which influence performance in the interpolating regime.
We quantify how the test error for overparameterized models achieving effectively zero training error depends on the quality of the implicit regularization imposed by e.g. the combination of model, parameter-initialization scheme.
arXiv Detail & Related papers (2023-11-13T01:48:08Z) - Algebraic and Statistical Properties of the Ordinary Least Squares Interpolator [3.4320157633663064]
We provide results for the minimum $ell$-norm OLS interpolator.
We present statistical results such as an extension of the Gauss-Markov theorem.
We conduct simulations that further explore the properties of the OLS interpolator.
arXiv Detail & Related papers (2023-09-27T16:41:10Z) - Capturing dynamical correlations using implicit neural representations [85.66456606776552]
We develop an artificial intelligence framework which combines a neural network trained to mimic simulated data from a model Hamiltonian with automatic differentiation to recover unknown parameters from experimental data.
In doing so, we illustrate the ability to build and train a differentiable model only once, which then can be applied in real-time to multi-dimensional scattering data.
arXiv Detail & Related papers (2023-04-08T07:55:36Z) - Neural Frailty Machine: Beyond proportional hazard assumption in neural
survival regressions [30.018173329118184]
We present neural frailty machine (NFM), a powerful and flexible neural modeling framework for survival regressions.
Two concrete models are derived under the framework that extends neural proportional hazard models and non hazard regression models.
We conduct experimental evaluations over $6$ benchmark datasets of different scales, showing that the proposed NFM models outperform state-of-the-art survival models in terms of predictive performance.
arXiv Detail & Related papers (2023-03-18T08:15:15Z) - The Asymmetric Maximum Margin Bias of Quasi-Homogeneous Neural Networks [26.58848653965855]
We introduce the class of quasi-homogeneous models, which is expressive enough to describe nearly all neural networks with homogeneous activations.
We find that gradient flow implicitly favors a subset of the parameters, unlike in the case of a homogeneous model where all parameters are treated equally.
arXiv Detail & Related papers (2022-10-07T21:14:09Z) - On the Generalization and Adaption Performance of Causal Models [99.64022680811281]
Differentiable causal discovery has proposed to factorize the data generating process into a set of modules.
We study the generalization and adaption performance of such modular neural causal models.
Our analysis shows that the modular neural causal models outperform other models on both zero and few-shot adaptation in low data regimes.
arXiv Detail & Related papers (2022-06-09T17:12:32Z) - Stochastic normalizing flows as non-equilibrium transformations [62.997667081978825]
We show that normalizing flows provide a route to sample lattice field theories more efficiently than conventional MonteCarlo simulations.
We lay out a strategy to optimize the efficiency of this extended class of generative models and present examples of applications.
arXiv Detail & Related papers (2022-01-21T19:00:18Z) - Closed-form Continuous-Depth Models [99.40335716948101]
Continuous-depth neural models rely on advanced numerical differential equation solvers.
We present a new family of models, termed Closed-form Continuous-depth (CfC) networks, that are simple to describe and at least one order of magnitude faster.
arXiv Detail & Related papers (2021-06-25T22:08:51Z) - The Neural Tangent Kernel in High Dimensions: Triple Descent and a
Multi-Scale Theory of Generalization [34.235007566913396]
Modern deep learning models employ considerably more parameters than required to fit the training data. Whereas conventional statistical wisdom suggests such models should drastically overfit, in practice these models generalize remarkably well.
An emerging paradigm for describing this unexpected behavior is in terms of a emphdouble descent curve.
We provide a precise high-dimensional analysis of generalization with the Neural Tangent Kernel, which characterizes the behavior of wide neural networks with gradient descent.
arXiv Detail & Related papers (2020-08-15T20:55:40Z) - Equivariant online predictions of non-stationary time series [0.0]
We analyze the theoretical predictive properties of statistical methods under model misspecification.
We show that a specific class of dynamic models -- random walk dynamic linear models -- produce exact minimax predictive densities.
arXiv Detail & Related papers (2019-11-20T01:46:24Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.