On the Influence of Enforcing Model Identifiability on Learning dynamics
of Gaussian Mixture Models
- URL: http://arxiv.org/abs/2206.08598v1
- Date: Fri, 17 Jun 2022 07:50:22 GMT
- Title: On the Influence of Enforcing Model Identifiability on Learning dynamics
of Gaussian Mixture Models
- Authors: Pascal Mattia Esser, Frank Nielsen
- Abstract summary: We propose a technique for extracting submodels from singular models.
Our method enforces model identifiability during training.
We show how the method can be applied to more complex models like deep neural networks.
- Score: 14.759688428864159
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: A common way to learn and analyze statistical models is to consider
operations in the model parameter space. But what happens if we optimize in the
parameter space and there is no one-to-one mapping between the parameter space
and the underlying statistical model space? Such cases frequently occur for
hierarchical models which include statistical mixtures or stochastic neural
networks, and these models are said to be singular. Singular models reveal
several important and well-studied problems in machine learning like the
decrease in convergence speed of learning trajectories due to attractor
behaviors. In this work, we propose a relative reparameterization technique of
the parameter space, which yields a general method for extracting regular
submodels from singular models. Our method enforces model identifiability
during training and we study the learning dynamics for gradient descent and
expectation maximization for Gaussian Mixture Models (GMMs) under relative
parameterization, showing faster experimental convergence and a improved
manifold shape of the dynamics around the singularity. Extending the analysis
beyond GMMs, we furthermore analyze the Fisher information matrix under
relative reparameterization and its influence on the generalization error, and
show how the method can be applied to more complex models like deep neural
networks.
Related papers
- SMILE: Zero-Shot Sparse Mixture of Low-Rank Experts Construction From Pre-Trained Foundation Models [85.67096251281191]
We present an innovative approach to model fusion called zero-shot Sparse MIxture of Low-rank Experts (SMILE) construction.
SMILE allows for the upscaling of source models into an MoE model without extra data or further training.
We conduct extensive experiments across diverse scenarios, such as image classification and text generation tasks, using full fine-tuning and LoRA fine-tuning.
arXiv Detail & Related papers (2024-08-19T17:32:15Z) - Towards Learning Stochastic Population Models by Gradient Descent [0.0]
We show that simultaneous estimation of parameters and structure poses major challenges for optimization procedures.
We demonstrate accurate estimation of models but find that enforcing the inference of parsimonious, interpretable models drastically increases the difficulty.
arXiv Detail & Related papers (2024-04-10T14:38:58Z) - Data-Driven Model Selections of Second-Order Particle Dynamics via
Integrating Gaussian Processes with Low-Dimensional Interacting Structures [0.9821874476902972]
We focus on the data-driven discovery of a general second-order particle-based model.
We present applications to modeling two real-world fish motion datasets.
arXiv Detail & Related papers (2023-11-01T23:45:15Z) - Active-Learning-Driven Surrogate Modeling for Efficient Simulation of
Parametric Nonlinear Systems [0.0]
In absence of governing equations, we need to construct the parametric reduced-order surrogate model in a non-intrusive fashion.
Our work provides a non-intrusive optimality criterion to efficiently populate the parameter snapshots.
We propose an active-learning-driven surrogate model using kernel-based shallow neural networks.
arXiv Detail & Related papers (2023-06-09T18:01:14Z) - Capturing dynamical correlations using implicit neural representations [85.66456606776552]
We develop an artificial intelligence framework which combines a neural network trained to mimic simulated data from a model Hamiltonian with automatic differentiation to recover unknown parameters from experimental data.
In doing so, we illustrate the ability to build and train a differentiable model only once, which then can be applied in real-time to multi-dimensional scattering data.
arXiv Detail & Related papers (2023-04-08T07:55:36Z) - Neural Superstatistics for Bayesian Estimation of Dynamic Cognitive
Models [2.7391842773173334]
We develop a simulation-based deep learning method for Bayesian inference, which can recover both time-varying and time-invariant parameters.
Our results show that the deep learning approach is very efficient in capturing the temporal dynamics of the model.
arXiv Detail & Related papers (2022-11-23T17:42:53Z) - On the Generalization and Adaption Performance of Causal Models [99.64022680811281]
Differentiable causal discovery has proposed to factorize the data generating process into a set of modules.
We study the generalization and adaption performance of such modular neural causal models.
Our analysis shows that the modular neural causal models outperform other models on both zero and few-shot adaptation in low data regimes.
arXiv Detail & Related papers (2022-06-09T17:12:32Z) - Dynamically-Scaled Deep Canonical Correlation Analysis [77.34726150561087]
Canonical Correlation Analysis (CCA) is a method for feature extraction of two views by finding maximally correlated linear projections of them.
We introduce a novel dynamic scaling method for training an input-dependent canonical correlation model.
arXiv Detail & Related papers (2022-03-23T12:52:49Z) - Mixed Effects Neural ODE: A Variational Approximation for Analyzing the
Dynamics of Panel Data [50.23363975709122]
We propose a probabilistic model called ME-NODE to incorporate (fixed + random) mixed effects for analyzing panel data.
We show that our model can be derived using smooth approximations of SDEs provided by the Wong-Zakai theorem.
We then derive Evidence Based Lower Bounds for ME-NODE, and develop (efficient) training algorithms.
arXiv Detail & Related papers (2022-02-18T22:41:51Z) - Towards Modeling and Resolving Singular Parameter Spaces using
Stratifolds [18.60761407945024]
In learning dynamics, singularities can act as attractors on the learning trajectory and, therefore, negatively influence the convergence speed of models.
We propose a general approach to circumvent the problem arising from singularities by using stratifolds.
We empirically show that using (natural) gradient descent on the smooth manifold approximation instead of the singular space allows us to avoid the attractor behavior and therefore improve the convergence speed in learning.
arXiv Detail & Related papers (2021-12-07T14:42:45Z) - Post-mortem on a deep learning contest: a Simpson's paradox and the
complementary roles of scale metrics versus shape metrics [61.49826776409194]
We analyze a corpus of models made publicly-available for a contest to predict the generalization accuracy of neural network (NN) models.
We identify what amounts to a Simpson's paradox: where "scale" metrics perform well overall but perform poorly on sub partitions of the data.
We present two novel shape metrics, one data-independent, and the other data-dependent, which can predict trends in the test accuracy of a series of NNs.
arXiv Detail & Related papers (2021-06-01T19:19:49Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.