Training Dynamics of Nonlinear Contrastive Learning Model in the High Dimensional Limit
- URL: http://arxiv.org/abs/2406.06909v1
- Date: Tue, 11 Jun 2024 03:07:41 GMT
- Title: Training Dynamics of Nonlinear Contrastive Learning Model in the High Dimensional Limit
- Authors: Lineghuan Meng, Chuang Wang,
- Abstract summary: An empirical distribution of the model weights converges to a deterministic measure governed by a McKean-Vlasov nonlinear partial differential equation (PDE)
Under L2 regularization, this PDE reduces to a closed set of low-dimensional ordinary differential equations (ODEs)
We analyze the fixed point locations and their stability of the ODEs unveiling several interesting findings.
- Score: 1.7597525104451157
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This letter presents a high-dimensional analysis of the training dynamics for a single-layer nonlinear contrastive learning model. The empirical distribution of the model weights converges to a deterministic measure governed by a McKean-Vlasov nonlinear partial differential equation (PDE). Under L2 regularization, this PDE reduces to a closed set of low-dimensional ordinary differential equations (ODEs), reflecting the evolution of the model performance during the training process. We analyze the fixed point locations and their stability of the ODEs unveiling several interesting findings. First, only the hidden variable's second moment affects feature learnability at the state with uninformative initialization. Second, higher moments influence the probability of feature selection by controlling the attraction region, rather than affecting local stability. Finally, independent noises added in the data argumentation degrade performance but negatively correlated noise can reduces the variance of gradient estimation yielding better performance. Despite of the simplicity of the analyzed model, it exhibits a rich phenomena of training dynamics, paving a way to understand more complex mechanism behind practical large models.
Related papers
- Modeling Latent Neural Dynamics with Gaussian Process Switching Linear Dynamical Systems [2.170477444239546]
We develop an approach that balances these two objectives: the Gaussian Process Switching Linear Dynamical System (gpSLDS)
Our method builds on previous work modeling the latent state evolution via a differential equation whose nonlinear dynamics are described by a Gaussian process (GP-SDEs)
Our approach resolves key limitations of the rSLDS such as artifactual oscillations in dynamics near discrete state boundaries, while also providing posterior uncertainty estimates of the dynamics.
arXiv Detail & Related papers (2024-07-19T15:32:15Z) - Physically Analyzable AI-Based Nonlinear Platoon Dynamics Modeling During Traffic Oscillation: A Koopman Approach [4.379212829795889]
There exists a critical need for a modeling methodology with high accuracy while concurrently achieving physical analyzability.
This paper proposes an AI-based Koopman approach to model the unknown nonlinear platoon dynamics harnessing the power of AI.
arXiv Detail & Related papers (2024-06-20T19:35:21Z) - On the Dynamics Under the Unhinged Loss and Beyond [104.49565602940699]
We introduce the unhinged loss, a concise loss function, that offers more mathematical opportunities to analyze closed-form dynamics.
The unhinged loss allows for considering more practical techniques, such as time-vary learning rates and feature normalization.
arXiv Detail & Related papers (2023-12-13T02:11:07Z) - Deep Latent Force Models: ODE-based Process Convolutions for Bayesian
Deep Learning [0.0]
The deep latent force model (DLFM) is a deep Gaussian process with physics-informed kernels at each layer.
We present empirical evidence of the capability of the DLFM to capture the dynamics present in highly nonlinear real-world time series data.
We find that the DLFM is capable of achieving comparable performance to a range of non-physics-informed probabilistic models.
arXiv Detail & Related papers (2023-11-24T19:55:57Z) - Learning Space-Time Continuous Neural PDEs from Partially Observed
States [13.01244901400942]
We introduce a grid-independent model learning partial differential equations (PDEs) from noisy and partial observations on irregular grids.
We propose a space-time continuous latent neural PDE model with an efficient probabilistic framework and a novel design encoder for improved data efficiency and grid independence.
arXiv Detail & Related papers (2023-07-09T06:53:59Z) - Knowledge Distillation Performs Partial Variance Reduction [93.6365393721122]
Knowledge distillation is a popular approach for enhancing the performance of ''student'' models.
The underlying mechanics behind knowledge distillation (KD) are still not fully understood.
We show that KD can be interpreted as a novel type of variance reduction mechanism.
arXiv Detail & Related papers (2023-05-27T21:25:55Z) - Capturing dynamical correlations using implicit neural representations [85.66456606776552]
We develop an artificial intelligence framework which combines a neural network trained to mimic simulated data from a model Hamiltonian with automatic differentiation to recover unknown parameters from experimental data.
In doing so, we illustrate the ability to build and train a differentiable model only once, which then can be applied in real-time to multi-dimensional scattering data.
arXiv Detail & Related papers (2023-04-08T07:55:36Z) - Learning Physics-Informed Neural Networks without Stacked
Back-propagation [82.26566759276105]
We develop a novel approach that can significantly accelerate the training of Physics-Informed Neural Networks.
In particular, we parameterize the PDE solution by the Gaussian smoothed model and show that, derived from Stein's Identity, the second-order derivatives can be efficiently calculated without back-propagation.
Experimental results show that our proposed method can achieve competitive error compared to standard PINN training but is two orders of magnitude faster.
arXiv Detail & Related papers (2022-02-18T18:07:54Z) - Estimation of Bivariate Structural Causal Models by Variational Gaussian
Process Regression Under Likelihoods Parametrised by Normalising Flows [74.85071867225533]
Causal mechanisms can be described by structural causal models.
One major drawback of state-of-the-art artificial intelligence is its lack of explainability.
arXiv Detail & Related papers (2021-09-06T14:52:58Z) - Multiplicative noise and heavy tails in stochastic optimization [62.993432503309485]
empirical optimization is central to modern machine learning, but its role in its success is still unclear.
We show that it commonly arises in parameters of discrete multiplicative noise due to variance.
A detailed analysis is conducted in which we describe on key factors, including recent step size, and data, all exhibit similar results on state-of-the-art neural network models.
arXiv Detail & Related papers (2020-06-11T09:58:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.