Towards Modeling and Resolving Singular Parameter Spaces using
Stratifolds
- URL: http://arxiv.org/abs/2112.03734v1
- Date: Tue, 7 Dec 2021 14:42:45 GMT
- Title: Towards Modeling and Resolving Singular Parameter Spaces using
Stratifolds
- Authors: Pascal Mattia Esser, Frank Nielsen
- Abstract summary: In learning dynamics, singularities can act as attractors on the learning trajectory and, therefore, negatively influence the convergence speed of models.
We propose a general approach to circumvent the problem arising from singularities by using stratifolds.
We empirically show that using (natural) gradient descent on the smooth manifold approximation instead of the singular space allows us to avoid the attractor behavior and therefore improve the convergence speed in learning.
- Score: 18.60761407945024
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: When analyzing parametric statistical models, a useful approach consists in
modeling geometrically the parameter space. However, even for very simple and
commonly used hierarchical models like statistical mixtures or stochastic deep
neural networks, the smoothness assumption of manifolds is violated at singular
points which exhibit non-smooth neighborhoods in the parameter space. These
singular models have been analyzed in the context of learning dynamics, where
singularities can act as attractors on the learning trajectory and, therefore,
negatively influence the convergence speed of models. We propose a general
approach to circumvent the problem arising from singularities by using
stratifolds, a concept from algebraic topology, to formally model singular
parameter spaces. We use the property that specific stratifolds are equipped
with a resolution method to construct a smooth manifold approximation of the
singular space. We empirically show that using (natural) gradient descent on
the smooth manifold approximation instead of the singular space allows us to
avoid the attractor behavior and therefore improve the convergence speed in
learning.
Related papers
- Relative Representations: Topological and Geometric Perspectives [53.88896255693922]
Relative representations are an established approach to zero-shot model stitching.
We introduce a normalization procedure in the relative transformation, resulting in invariance to non-isotropic rescalings and permutations.
Second, we propose to deploy topological densification when fine-tuning relative representations, a topological regularization loss encouraging clustering within classes.
arXiv Detail & Related papers (2024-09-17T08:09:22Z) - SMILE: Zero-Shot Sparse Mixture of Low-Rank Experts Construction From Pre-Trained Foundation Models [85.67096251281191]
We present an innovative approach to model fusion called zero-shot Sparse MIxture of Low-rank Experts (SMILE) construction.
SMILE allows for the upscaling of source models into an MoE model without extra data or further training.
We conduct extensive experiments across diverse scenarios, such as image classification and text generation tasks, using full fine-tuning and LoRA fine-tuning.
arXiv Detail & Related papers (2024-08-19T17:32:15Z) - Geometric Neural Diffusion Processes [55.891428654434634]
We extend the framework of diffusion models to incorporate a series of geometric priors in infinite-dimension modelling.
We show that with these conditions, the generative functional model admits the same symmetry.
arXiv Detail & Related papers (2023-07-11T16:51:38Z) - Capturing dynamical correlations using implicit neural representations [85.66456606776552]
We develop an artificial intelligence framework which combines a neural network trained to mimic simulated data from a model Hamiltonian with automatic differentiation to recover unknown parameters from experimental data.
In doing so, we illustrate the ability to build and train a differentiable model only once, which then can be applied in real-time to multi-dimensional scattering data.
arXiv Detail & Related papers (2023-04-08T07:55:36Z) - On the Influence of Enforcing Model Identifiability on Learning dynamics
of Gaussian Mixture Models [14.759688428864159]
We propose a technique for extracting submodels from singular models.
Our method enforces model identifiability during training.
We show how the method can be applied to more complex models like deep neural networks.
arXiv Detail & Related papers (2022-06-17T07:50:22Z) - Latent Space Model for Higher-order Networks and Generalized Tensor
Decomposition [18.07071669486882]
We introduce a unified framework, formulated as general latent space models, to study complex higher-order network interactions.
We formulate the relationship between the latent positions and the observed data via a generalized multilinear kernel as the link function.
We demonstrate the effectiveness of our method on synthetic data.
arXiv Detail & Related papers (2021-06-30T13:11:17Z) - Continuous normalizing flows on manifolds [0.342658286826597]
We describe how the recently introduced Neural ODEs and continuous normalizing flows can be extended to arbitrary smooth manifold.
We propose a general methodology for parameterizing vector fields on these spaces and demonstrate how gradient-based learning can be performed.
arXiv Detail & Related papers (2021-03-14T15:35:19Z) - OnsagerNet: Learning Stable and Interpretable Dynamics using a
Generalized Onsager Principle [19.13913681239968]
We learn stable and physically interpretable dynamical models using sampled trajectory data from physical processes based on a generalized Onsager principle.
We further apply this method to study Rayleigh-Benard convection and learn Lorenz-like low dimensional autonomous reduced order models.
arXiv Detail & Related papers (2020-09-06T07:30:59Z) - Understanding Implicit Regularization in Over-Parameterized Single Index
Model [55.41685740015095]
We design regularization-free algorithms for the high-dimensional single index model.
We provide theoretical guarantees for the induced implicit regularization phenomenon.
arXiv Detail & Related papers (2020-07-16T13:27:47Z) - On the minmax regret for statistical manifolds: the role of curvature [68.8204255655161]
Two-part codes and the minimum description length have been successful in delivering procedures to single out the best models.
We derive a sharper expression than the standard one given by the complexity, where the scalar curvature of the Fisher information metric plays a dominant role.
arXiv Detail & Related papers (2020-07-06T17:28:19Z) - Differentiable Segmentation of Sequences [2.1485350418225244]
We build on advances in learning continuous warping functions and propose a novel family of warping functions based on the two-sided power (TSP) distribution.
Our formulation includes the important class of segmented generalized linear models as a special case.
We use our approach to model the spread of COVID-19 with Poisson regression, apply it on a change point detection task, and learn classification models with concept drift.
arXiv Detail & Related papers (2020-06-23T15:51:48Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.