Training Dynamics of Nonlinear Contrastive Learning Model in the High Dimensional Limit
- URL: http://arxiv.org/abs/2406.06909v1
- Date: Tue, 11 Jun 2024 03:07:41 GMT
- Title: Training Dynamics of Nonlinear Contrastive Learning Model in the High Dimensional Limit
- Authors: Lineghuan Meng, Chuang Wang,
- Abstract summary: An empirical distribution of the model weights converges to a deterministic measure governed by a McKean-Vlasov nonlinear partial differential equation (PDE)
Under L2 regularization, this PDE reduces to a closed set of low-dimensional ordinary differential equations (ODEs)
We analyze the fixed point locations and their stability of the ODEs unveiling several interesting findings.
- Score: 1.7597525104451157
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This letter presents a high-dimensional analysis of the training dynamics for a single-layer nonlinear contrastive learning model. The empirical distribution of the model weights converges to a deterministic measure governed by a McKean-Vlasov nonlinear partial differential equation (PDE). Under L2 regularization, this PDE reduces to a closed set of low-dimensional ordinary differential equations (ODEs), reflecting the evolution of the model performance during the training process. We analyze the fixed point locations and their stability of the ODEs unveiling several interesting findings. First, only the hidden variable's second moment affects feature learnability at the state with uninformative initialization. Second, higher moments influence the probability of feature selection by controlling the attraction region, rather than affecting local stability. Finally, independent noises added in the data argumentation degrade performance but negatively correlated noise can reduces the variance of gradient estimation yielding better performance. Despite of the simplicity of the analyzed model, it exhibits a rich phenomena of training dynamics, paving a way to understand more complex mechanism behind practical large models.
Related papers
- No Equations Needed: Learning System Dynamics Without Relying on Closed-Form ODEs [56.78271181959529]
This paper proposes a conceptual shift to modeling low-dimensional dynamical systems by departing from the traditional two-step modeling process.
Instead of first discovering a closed-form equation and then analyzing it, our approach, direct semantic modeling, predicts the semantic representation of the dynamical system.
Our approach not only simplifies the modeling pipeline but also enhances the transparency and flexibility of the resulting models.
arXiv Detail & Related papers (2025-01-30T18:36:48Z) - Improving the Noise Estimation of Latent Neural Stochastic Differential Equations [4.64982780843177]
Latent neural differential equations (SDEs) have recently emerged as a promising approach for learning generative models from time series data.
We investigate this underestimation in detail and propose a straightforward solution: by including an explicit additional noise regularization in the loss function.
We are able to learn a model that accurately captures the diffusion component of the data.
arXiv Detail & Related papers (2024-12-23T11:56:35Z) - Modeling Latent Neural Dynamics with Gaussian Process Switching Linear Dynamical Systems [2.170477444239546]
We develop an approach that balances these two objectives: the Gaussian Process Switching Linear Dynamical System (gpSLDS)
Our method builds on previous work modeling the latent state evolution via a differential equation whose nonlinear dynamics are described by a Gaussian process (GP-SDEs)
Our approach resolves key limitations of the rSLDS such as artifactual oscillations in dynamics near discrete state boundaries, while also providing posterior uncertainty estimates of the dynamics.
arXiv Detail & Related papers (2024-07-19T15:32:15Z) - Physically Analyzable AI-Based Nonlinear Platoon Dynamics Modeling During Traffic Oscillation: A Koopman Approach [4.379212829795889]
There exists a critical need for a modeling methodology with high accuracy while concurrently achieving physical analyzability.
This paper proposes an AI-based Koopman approach to model the unknown nonlinear platoon dynamics harnessing the power of AI.
arXiv Detail & Related papers (2024-06-20T19:35:21Z) - On the Dynamics Under the Unhinged Loss and Beyond [104.49565602940699]
We introduce the unhinged loss, a concise loss function, that offers more mathematical opportunities to analyze closed-form dynamics.
The unhinged loss allows for considering more practical techniques, such as time-vary learning rates and feature normalization.
arXiv Detail & Related papers (2023-12-13T02:11:07Z) - Deep Latent Force Models: ODE-based Process Convolutions for Bayesian
Deep Learning [0.0]
The deep latent force model (DLFM) is a deep Gaussian process with physics-informed kernels at each layer.
We present empirical evidence of the capability of the DLFM to capture the dynamics present in highly nonlinear real-world time series data.
We find that the DLFM is capable of achieving comparable performance to a range of non-physics-informed probabilistic models.
arXiv Detail & Related papers (2023-11-24T19:55:57Z) - Knowledge Distillation Performs Partial Variance Reduction [93.6365393721122]
Knowledge distillation is a popular approach for enhancing the performance of ''student'' models.
The underlying mechanics behind knowledge distillation (KD) are still not fully understood.
We show that KD can be interpreted as a novel type of variance reduction mechanism.
arXiv Detail & Related papers (2023-05-27T21:25:55Z) - Capturing dynamical correlations using implicit neural representations [85.66456606776552]
We develop an artificial intelligence framework which combines a neural network trained to mimic simulated data from a model Hamiltonian with automatic differentiation to recover unknown parameters from experimental data.
In doing so, we illustrate the ability to build and train a differentiable model only once, which then can be applied in real-time to multi-dimensional scattering data.
arXiv Detail & Related papers (2023-04-08T07:55:36Z) - Estimation of Bivariate Structural Causal Models by Variational Gaussian
Process Regression Under Likelihoods Parametrised by Normalising Flows [74.85071867225533]
Causal mechanisms can be described by structural causal models.
One major drawback of state-of-the-art artificial intelligence is its lack of explainability.
arXiv Detail & Related papers (2021-09-06T14:52:58Z) - Multiplicative noise and heavy tails in stochastic optimization [62.993432503309485]
empirical optimization is central to modern machine learning, but its role in its success is still unclear.
We show that it commonly arises in parameters of discrete multiplicative noise due to variance.
A detailed analysis is conducted in which we describe on key factors, including recent step size, and data, all exhibit similar results on state-of-the-art neural network models.
arXiv Detail & Related papers (2020-06-11T09:58:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.