On the Generalization and Approximation Capacities of Neural Controlled Differential Equations
- URL: http://arxiv.org/abs/2305.16791v4
- Date: Mon, 1 Jul 2024 19:29:57 GMT
- Title: On the Generalization and Approximation Capacities of Neural Controlled Differential Equations
- Authors: Linus Bleistein, Agathe Guilloux,
- Abstract summary: Neural Controlled Differential Equations (NCDEs) are a state-of-the-art tool for supervised learning with irregularly sampled time series.
We show how classical approximation results on neural nets may transfer to NCDEs.
- Score: 0.3222802562733786
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Neural Controlled Differential Equations (NCDEs) are a state-of-the-art tool for supervised learning with irregularly sampled time series (Kidger, 2020). However, no theoretical analysis of their performance has been provided yet, and it remains unclear in particular how the irregularity of the time series affects their predictions. By merging the rich theory of controlled differential equations (CDE) and Lipschitz-based measures of the complexity of deep neural nets, we take a first step towards the theoretical understanding of NCDE. Our first result is a generalization bound for this class of predictors that depends on the regularity of the time series data. In a second time, we leverage the continuity of the flow of CDEs to provide a detailed analysis of both the sampling-induced bias and the approximation bias. Regarding this last result, we show how classical approximation results on neural nets may transfer to NCDEs. Our theoretical results are validated through a series of experiments.
Related papers
- Understanding Forgetting in Continual Learning with Linear Regression [21.8755265936716]
Continual learning, focused on sequentially learning multiple tasks, has gained significant attention recently.
We provide a general theoretical analysis of forgetting in the linear regression model via Gradient Descent.
We demonstrate that, given a sufficiently large data size, the arrangement of tasks in a sequence, where tasks with larger eigenvalues in their population data covariance matrices are trained later, tends to result in increased forgetting.
arXiv Detail & Related papers (2024-05-27T18:33:37Z) - Towards Demystifying the Generalization Behaviors When Neural Collapse
Emerges [132.62934175555145]
Neural Collapse (NC) is a well-known phenomenon of deep neural networks in the terminal phase of training (TPT)
We propose a theoretical explanation for why continuing training can still lead to accuracy improvement on test set, even after the train accuracy has reached 100%.
We refer to this newly discovered property as "non-conservative generalization"
arXiv Detail & Related papers (2023-10-12T14:29:02Z) - Variational Bayesian Neural Networks via Resolution of Singularities [1.2183405753834562]
We advocate for the importance of singular learning theory (SLT) as it pertains to the theory and practice of variational inference in Bayesian neural networks (BNNs)
We lay to rest some of the confusion surrounding discrepancies between downstream predictive performance measured via e.g., the test log predictive density, and the variational objective.
We use the SLT-corrected form for singular posterior distributions to inform the design of the variational family itself.
arXiv Detail & Related papers (2023-02-13T00:32:49Z) - Theoretical analysis of deep neural networks for temporally dependent
observations [1.6752182911522522]
We study theoretical properties of deep neural networks on modeling non-linear time series data.
Results are supported via various numerical simulation settings as well as an application to a macroeconomic data set.
arXiv Detail & Related papers (2022-10-20T18:56:37Z) - On the generalization of learning algorithms that do not converge [54.122745736433856]
Generalization analyses of deep learning typically assume that the training converges to a fixed point.
Recent results indicate that in practice, the weights of deep neural networks optimized with gradient descent often oscillate indefinitely.
arXiv Detail & Related papers (2022-08-16T21:22:34Z) - Fluctuations, Bias, Variance & Ensemble of Learners: Exact Asymptotics
for Convex Losses in High-Dimension [25.711297863946193]
We develop a theory for the study of fluctuations in an ensemble of generalised linear models trained on different, but correlated, features.
We provide a complete description of the joint distribution of the empirical risk minimiser for generic convex loss and regularisation in the high-dimensional limit.
arXiv Detail & Related papers (2022-01-31T17:44:58Z) - Multi-scale Feature Learning Dynamics: Insights for Double Descent [71.91871020059857]
We study the phenomenon of "double descent" of the generalization error.
We find that double descent can be attributed to distinct features being learned at different scales.
arXiv Detail & Related papers (2021-12-06T18:17:08Z) - Optimization Variance: Exploring Generalization Properties of DNNs [83.78477167211315]
The test error of a deep neural network (DNN) often demonstrates double descent.
We propose a novel metric, optimization variance (OV), to measure the diversity of model updates.
arXiv Detail & Related papers (2021-06-03T09:34:17Z) - A Dynamical Central Limit Theorem for Shallow Neural Networks [48.66103132697071]
We prove that the fluctuations around the mean limit remain bounded in mean square throughout training.
If the mean-field dynamics converges to a measure that interpolates the training data, we prove that the deviation eventually vanishes in the CLT scaling.
arXiv Detail & Related papers (2020-08-21T18:00:50Z) - Liquid Time-constant Networks [117.57116214802504]
We introduce a new class of time-continuous recurrent neural network models.
Instead of declaring a learning system's dynamics by implicit nonlinearities, we construct networks of linear first-order dynamical systems.
These neural networks exhibit stable and bounded behavior, yield superior expressivity within the family of neural ordinary differential equations.
arXiv Detail & Related papers (2020-06-08T09:53:35Z) - Equivariant online predictions of non-stationary time series [0.0]
We analyze the theoretical predictive properties of statistical methods under model misspecification.
We show that a specific class of dynamic models -- random walk dynamic linear models -- produce exact minimax predictive densities.
arXiv Detail & Related papers (2019-11-20T01:46:24Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.