Deep equilibrium models as estimators for continuous latent variables
- URL: http://arxiv.org/abs/2211.05943v1
- Date: Fri, 11 Nov 2022 01:21:34 GMT
- Title: Deep equilibrium models as estimators for continuous latent variables
- Authors: Russell Tsuchida and Cheng Soon Ong
- Abstract summary: We show explicit relationships between neural network architectures and statistical models.
We find that deep equilibrium models solve maximum a-posteriori (MAP) estimates for the latents and parameters of the transformation.
Our DEQ feature maps are end-to-end differentiable, enabling fine-tuning for downstream tasks.
- Score: 10.244213671349225
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Principal Component Analysis (PCA) and its exponential family extensions have
three components: observations, latents and parameters of a linear
transformation. We consider a generalised setting where the canonical
parameters of the exponential family are a nonlinear transformation of the
latents. We show explicit relationships between particular neural network
architectures and the corresponding statistical models. We find that deep
equilibrium models -- a recently introduced class of implicit neural networks
-- solve maximum a-posteriori (MAP) estimates for the latents and parameters of
the transformation. Our analysis provides a systematic way to relate activation
functions, dropout, and layer structure, to statistical assumptions about the
observations, thus providing foundational principles for unsupervised DEQs. For
hierarchical latents, individual neurons can be interpreted as nodes in a deep
graphical model. Our DEQ feature maps are end-to-end differentiable, enabling
fine-tuning for downstream tasks.
Related papers
- Latent Space Energy-based Neural ODEs [73.01344439786524]
This paper introduces a novel family of deep dynamical models designed to represent continuous-time sequence data.
We train the model using maximum likelihood estimation with Markov chain Monte Carlo.
Experiments on oscillating systems, videos and real-world state sequences (MuJoCo) illustrate that ODEs with the learnable energy-based prior outperform existing counterparts.
arXiv Detail & Related papers (2024-09-05T18:14:22Z) - Variational EP with Probabilistic Backpropagation for Bayesian Neural
Networks [0.0]
I propose a novel approach for nonlinear Logistic regression using a two-layer neural network (NN) model structure with hierarchical priors on the network weights.
I derive a computationally efficient algorithm, whose complexity scales similarly to an ensemble of independent sparse logistic models.
arXiv Detail & Related papers (2023-03-02T19:09:47Z) - Learning Low Dimensional State Spaces with Overparameterized Recurrent
Neural Nets [57.06026574261203]
We provide theoretical evidence for learning low-dimensional state spaces, which can also model long-term memory.
Experiments corroborate our theory, demonstrating extrapolation via learning low-dimensional state spaces with both linear and non-linear RNNs.
arXiv Detail & Related papers (2022-10-25T14:45:15Z) - Designing Universal Causal Deep Learning Models: The Case of
Infinite-Dimensional Dynamical Systems from Stochastic Analysis [3.5450828190071655]
Causal operators (COs) play a central role in contemporary analysis.
There is still no canonical framework for designing Deep Learning (DL) models capable of approximating COs.
This paper proposes a "geometry-aware" solution to this open problem by introducing a DL model-design framework.
arXiv Detail & Related papers (2022-10-24T14:43:03Z) - Deep learning and differential equations for modeling changes in
individual-level latent dynamics between observation periods [0.0]
We propose an extension where different sets of differential equation parameters are allowed for observation sub-periods.
We derive prediction targets from individual dynamic models of resilience in the application.
Our approach is seen to successfully identify individual-level parameters of dynamic models that allows us to stably select predictors.
arXiv Detail & Related papers (2022-02-15T13:53:42Z) - Modeling Implicit Bias with Fuzzy Cognitive Maps [0.0]
This paper presents a Fuzzy Cognitive Map model to quantify implicit bias in structured datasets.
We introduce a new reasoning mechanism equipped with a normalization-like transfer function that prevents neurons from saturating.
arXiv Detail & Related papers (2021-12-23T17:04:12Z) - Post-mortem on a deep learning contest: a Simpson's paradox and the
complementary roles of scale metrics versus shape metrics [61.49826776409194]
We analyze a corpus of models made publicly-available for a contest to predict the generalization accuracy of neural network (NN) models.
We identify what amounts to a Simpson's paradox: where "scale" metrics perform well overall but perform poorly on sub partitions of the data.
We present two novel shape metrics, one data-independent, and the other data-dependent, which can predict trends in the test accuracy of a series of NNs.
arXiv Detail & Related papers (2021-06-01T19:19:49Z) - Leveraging Global Parameters for Flow-based Neural Posterior Estimation [90.21090932619695]
Inferring the parameters of a model based on experimental observations is central to the scientific method.
A particularly challenging setting is when the model is strongly indeterminate, i.e., when distinct sets of parameters yield identical observations.
We present a method for cracking such indeterminacy by exploiting additional information conveyed by an auxiliary set of observations sharing global parameters.
arXiv Detail & Related papers (2021-02-12T12:23:13Z) - Phase diagram for two-layer ReLU neural networks at infinite-width limit [6.380166265263755]
We draw the phase diagram for the two-layer ReLU neural network at the infinite-width limit.
We identify three regimes in the phase diagram, i.e., linear regime, critical regime and condensed regime.
In the linear regime, NN training dynamics is approximately linear similar to a random feature model with an exponential loss decay.
In the condensed regime, we demonstrate through experiments that active neurons are condensed at several discrete orientations.
arXiv Detail & Related papers (2020-07-15T06:04:35Z) - Provably Efficient Neural Estimation of Structural Equation Model: An
Adversarial Approach [144.21892195917758]
We study estimation in a class of generalized Structural equation models (SEMs)
We formulate the linear operator equation as a min-max game, where both players are parameterized by neural networks (NNs), and learn the parameters of these neural networks using a gradient descent.
For the first time we provide a tractable estimation procedure for SEMs based on NNs with provable convergence and without the need for sample splitting.
arXiv Detail & Related papers (2020-07-02T17:55:47Z) - Kernel and Rich Regimes in Overparametrized Models [69.40899443842443]
We show that gradient descent on overparametrized multilayer networks can induce rich implicit biases that are not RKHS norms.
We also demonstrate this transition empirically for more complex matrix factorization models and multilayer non-linear networks.
arXiv Detail & Related papers (2020-02-20T15:43:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.