Emergence of hierarchical modes from deep learning
- URL: http://arxiv.org/abs/2208.09859v1
- Date: Sun, 21 Aug 2022 09:53:32 GMT
- Title: Emergence of hierarchical modes from deep learning
- Authors: Chan Li and Haiping Huang
- Abstract summary: We propose a mode decomposition learning that can interpret the weight matrices as a hierarchy of latent modes.
The mode decomposition learning points to a cheap and interpretable route towards the magical deep learning.
- Score: 2.0711789781518752
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Large-scale deep neural networks consume expensive training costs, but the
training results in less-interpretable weight matrices constructing the
networks. Here, we propose a mode decomposition learning that can interpret the
weight matrices as a hierarchy of latent modes. These modes are akin to
patterns in physics studies of memory networks. The mode decomposition learning
not only saves a significant large amount of training costs, but also explains
the network performance with the leading modes. The mode learning scheme shows
a progressively compact latent space across the network hierarchy, and the
least number of modes increases only logarithmically with the network width.
Our mode decomposition learning is also studied in an analytic on-line learning
setting, which reveals multi-stage of learning dynamics. Therefore, the
proposed mode decomposition learning points to a cheap and interpretable route
towards the magical deep learning.
Related papers
- Deep Learning Through A Telescoping Lens: A Simple Model Provides Empirical Insights On Grokking, Gradient Boosting & Beyond [61.18736646013446]
In pursuit of a deeper understanding of its surprising behaviors, we investigate the utility of a simple yet accurate model of a trained neural network.
Across three case studies, we illustrate how it can be applied to derive new empirical insights on a diverse range of prominent phenomena.
arXiv Detail & Related papers (2024-10-31T22:54:34Z) - Towards Scalable and Versatile Weight Space Learning [51.78426981947659]
This paper introduces the SANE approach to weight-space learning.
Our method extends the idea of hyper-representations towards sequential processing of subsets of neural network weights.
arXiv Detail & Related papers (2024-06-14T13:12:07Z) - Get rich quick: exact solutions reveal how unbalanced initializations promote rapid feature learning [26.07501953088188]
We study how unbalanced layer-specific initialization variances and learning rates determine the degree of feature learning.
Our analysis reveals that they conspire to influence the learning regime through a set of conserved quantities.
We provide evidence that this unbalanced rich regime drives feature learning in deep finite-width networks, promotes interpretability of early layers in CNNs, reduces the sample complexity of learning hierarchical data, and decreases the time to grokking in modular arithmetic.
arXiv Detail & Related papers (2024-06-10T10:42:37Z) - Radial Networks: Dynamic Layer Routing for High-Performance Large Language Models [9.637088945386227]
Large language models (LLMs) often struggle with strict memory, latency, and power demands.
Various forms of dynamic sparsity have been proposed that reduce compute on an input-by-input basis.
We propose Radial Networks, which perform token-level routing between layers guided by a trained router module.
arXiv Detail & Related papers (2024-04-07T09:52:31Z) - Spiking mode-based neural networks [2.5690340428649328]
Spiking neural networks play an important role in brain-like neuromorphic computations and in studying working mechanisms of neural circuits.
One drawback of training a large scale spiking neural network is that updating all weights is quite expensive.
We propose a spiking mode-based training protocol, where the recurrent weight matrix is explained as a Hopfield-like multiplication of three matrices.
arXiv Detail & Related papers (2023-10-23T06:54:17Z) - Online Network Source Optimization with Graph-Kernel MAB [62.6067511147939]
We propose Grab-UCB, a graph- kernel multi-arms bandit algorithm to learn online the optimal source placement in large scale networks.
We describe the network processes with an adaptive graph dictionary model, which typically leads to sparse spectral representations.
We derive the performance guarantees that depend on network parameters, which further influence the learning curve of the sequential decision strategy.
arXiv Detail & Related papers (2023-07-07T15:03:42Z) - The Underlying Correlated Dynamics in Neural Training [6.385006149689549]
Training of neural networks is a computationally intensive task.
We propose a model based on the correlation of the parameters' dynamics, which dramatically reduces the dimensionality.
This representation enhances the understanding of the underlying training dynamics and can pave the way for designing better acceleration techniques.
arXiv Detail & Related papers (2022-12-18T08:34:11Z) - Adaptive Convolutional Dictionary Network for CT Metal Artifact
Reduction [62.691996239590125]
We propose an adaptive convolutional dictionary network (ACDNet) for metal artifact reduction.
Our ACDNet can automatically learn the prior for artifact-free CT images via training data and adaptively adjust the representation kernels for each input CT image.
Our method inherits the clear interpretability of model-based methods and maintains the powerful representation ability of learning-based methods.
arXiv Detail & Related papers (2022-05-16T06:49:36Z) - Dimensionality Reduction in Deep Learning via Kronecker Multi-layer
Architectures [4.836352379142503]
We propose a new deep learning architecture based on fast matrix multiplication of a Kronecker product decomposition.
We show that this architecture allows a neural network to be trained and implemented with a significant reduction in computational time and resources.
arXiv Detail & Related papers (2022-04-08T19:54:52Z) - Incremental Training of a Recurrent Neural Network Exploiting a
Multi-Scale Dynamic Memory [79.42778415729475]
We propose a novel incrementally trained recurrent architecture targeting explicitly multi-scale learning.
We show how to extend the architecture of a simple RNN by separating its hidden state into different modules.
We discuss a training algorithm where new modules are iteratively added to the model to learn progressively longer dependencies.
arXiv Detail & Related papers (2020-06-29T08:35:49Z) - The large learning rate phase of deep learning: the catapult mechanism [50.23041928811575]
We present a class of neural networks with solvable training dynamics.
We find good agreement between our model's predictions and training dynamics in realistic deep learning settings.
We believe our results shed light on characteristics of models trained at different learning rates.
arXiv Detail & Related papers (2020-03-04T17:52:48Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.