Related papers: Approaching Deep Learning through the Spectral Dynamics of Weights

Approaching Deep Learning through the Spectral Dynamics of Weights

URL: http://arxiv.org/abs/2408.11804v1
Date: Wed, 21 Aug 2024 17:48:01 GMT
Title: Approaching Deep Learning through the Spectral Dynamics of Weights
Authors: David Yunis, Kumar Kshitij Patel, Samuel Wheeler, Pedro Savarese, Gal Vardi, Karen Livescu, Michael Maire, Matthew R. Walter,
Abstract summary: spectral dynamics of weights -- the behavior of singular values and vectors during optimization -- to clarify and unify several phenomena in deep learning. We identify a consistent bias in optimization across various experiments, from small-scale grokking'' to large-scale tasks like image classification with ConvNets, image generation with UNets, speech recognition with LSTMs, and language modeling with Transformers.
Score: 41.948042468042374
License: http://creativecommons.org/licenses/by/4.0/
Abstract: We propose an empirical approach centered on the spectral dynamics of weights -- the behavior of singular values and vectors during optimization -- to unify and clarify several phenomena in deep learning. We identify a consistent bias in optimization across various experiments, from small-scale ``grokking'' to large-scale tasks like image classification with ConvNets, image generation with UNets, speech recognition with LSTMs, and language modeling with Transformers. We also demonstrate that weight decay enhances this bias beyond its role as a norm regularizer, even in practical systems. Moreover, we show that these spectral dynamics distinguish memorizing networks from generalizing ones, offering a novel perspective on this longstanding conundrum. Additionally, we leverage spectral dynamics to explore the emergence of well-performing sparse subnetworks (lottery tickets) and the structure of the loss surface through linear mode connectivity. Our findings suggest that spectral dynamics provide a coherent framework to better understand the behavior of neural networks across diverse settings.

Related papers

From SGD to Spectra: A Theory of Neural Network Weight Dynamics [0.0]
Deep neural networks have revolutionized machine learning, yet their training dynamics remain theoretically unclear.<n>We develop a continuous-time, matrix-valued differential equation (SDE) framework that rigorously connects microscopic dynamics of SGD to macroscopic evolution of singular-value spectra in weight spectra.
arXiv Detail & Related papers (2025-07-17T01:06:39Z)
Weight-Space Linear Recurrent Neural Networks [0.5937476291232799]
WARP (Weight-space Adaptive Recurrent Prediction) is a powerful framework that unifies weight-space learning with linear recurrence.<n>We show that WARP matches or surpasses state-of-the-art baselines on diverse classification tasks.
arXiv Detail & Related papers (2025-06-01T20:13:28Z)
Dynamic Spectral Backpropagation for Efficient Neural Network Training [0.0]
Dynamic Spectral Backpropagation (DSBP) enhances neural network training under resource constraints by projecting gradients onto principal eigenvectors.<n>Five extensions are proposed to address challenges in robustness, fewshot learning, and hardware efficiency.<n>DSBP outperforms Sharpness Aware Minimization (SAM), Low Rank Adaptation (LoRA), and Model Agnostic Meta Learning (MAML) on CIFAR 10, Fashion MNIST, MedMNIST, and Tiny ImageNet.
arXiv Detail & Related papers (2025-05-29T11:47:50Z)
Machine Learning-Enhanced Characterisation of Structured Spectral Densities: Leveraging the Reaction Coordinate Mapping [41.94295877935867]
Spectral densities encode essential information about system-environment interactions in open-quantum systems. We leverage machine learning techniques to reconstruct key environmental features using the reaction coordinate mapping. For a dissipative spin-boson model with a structured spectral density expressed as a sum of Lorentzian peaks, we demonstrate that the time evolution of a system observable can be used by a neural network to classify the spectral density as comprising one, two, or three Lorentzian peaks.
arXiv Detail & Related papers (2025-01-13T17:02:04Z)
A Dynamical Systems-Inspired Pruning Strategy for Addressing Oversmoothing in Graph Neural Networks [18.185834696177654]
Oversmoothing in Graph Neural Networks (GNNs) poses a significant challenge as network depth increases. We identify the root causes of oversmoothing and propose textbftextitDYNAMO-GAT. Our theoretical analysis reveals how DYNAMO-GAT disrupts the convergence to oversmoothed states.
arXiv Detail & Related papers (2024-12-10T07:07:06Z)
Theoretical characterisation of the Gauss-Newton conditioning in Neural Networks [5.851101657703105]
We take a first step towards theoretically characterizing the conditioning of the Gauss-Newton (GN) matrix in neural networks. We establish tight bounds on the condition number of the GN in deep linear networks of arbitrary depth and width. We expand the analysis to further architectural components, such as residual connections and convolutional layers.
arXiv Detail & Related papers (2024-11-04T14:56:48Z)
Learn to Memorize and to Forget: A Continual Learning Perspective of Dynamic SLAM [17.661231232206028]
Simultaneous localization and mapping (SLAM) with implicit neural representations has received extensive attention. We propose a novel SLAM framework for dynamic environments.
arXiv Detail & Related papers (2024-07-18T09:35:48Z)
Hallmarks of Optimization Trajectories in Neural Networks: Directional Exploration and Redundancy [75.15685966213832]
We analyze the rich directional structure of optimization trajectories represented by their pointwise parameters. We show that training only scalar batchnorm parameters some while into training matches the performance of training the entire network.
arXiv Detail & Related papers (2024-03-12T07:32:47Z)
Tractable Dendritic RNNs for Reconstructing Nonlinear Dynamical Systems [7.045072177165241]
We augment a piecewise-linear recurrent neural network (RNN) by a linear spline basis expansion. We show that this approach retains all the theoretically appealing properties of the simple PLRNN, yet boosts its capacity for approximating arbitrary nonlinear dynamical systems in comparatively low dimensions.
arXiv Detail & Related papers (2022-07-06T09:43:03Z)
Momentum Diminishes the Effect of Spectral Bias in Physics-Informed Neural Networks [72.09574528342732]
Physics-informed neural network (PINN) algorithms have shown promising results in solving a wide range of problems involving partial differential equations (PDEs) They often fail to converge to desirable solutions when the target function contains high-frequency features, due to a phenomenon known as spectral bias. In the present work, we exploit neural tangent kernels (NTKs) to investigate the training dynamics of PINNs evolving under gradient descent with momentum (SGDM)
arXiv Detail & Related papers (2022-06-29T19:03:10Z)
PredRNN: A Recurrent Neural Network for Spatiotemporal Predictive Learning [109.84770951839289]
We present PredRNN, a new recurrent network for learning visual dynamics from historical context. We show that our approach obtains highly competitive results on three standard datasets.
arXiv Detail & Related papers (2021-03-17T08:28:30Z)
Neural Dynamic Mode Decomposition for End-to-End Modeling of Nonlinear Dynamics [49.41640137945938]
We propose a neural dynamic mode decomposition for estimating a lift function based on neural networks. With our proposed method, the forecast error is backpropagated through the neural networks and the spectral decomposition. Our experiments demonstrate the effectiveness of our proposed method in terms of eigenvalue estimation and forecast performance.
arXiv Detail & Related papers (2020-12-11T08:34:26Z)
Kernel and Rich Regimes in Overparametrized Models [69.40899443842443]
We show that gradient descent on overparametrized multilayer networks can induce rich implicit biases that are not RKHS norms. We also demonstrate this transition empirically for more complex matrix factorization models and multilayer non-linear networks.
arXiv Detail & Related papers (2020-02-20T15:43:02Z)

This list is automatically generated from the titles and abstracts of the papers in this site.