Related papers: Emergence of hierarchical modes from deep learning

Emergence of hierarchical modes from deep learning

URL: http://arxiv.org/abs/2208.09859v1
Date: Sun, 21 Aug 2022 09:53:32 GMT
Title: Emergence of hierarchical modes from deep learning
Authors: Chan Li and Haiping Huang
Abstract summary: We propose a mode decomposition learning that can interpret the weight matrices as a hierarchy of latent modes. The mode decomposition learning points to a cheap and interpretable route towards the magical deep learning.
Score: 2.0711789781518752
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Large-scale deep neural networks consume expensive training costs, but the training results in less-interpretable weight matrices constructing the networks. Here, we propose a mode decomposition learning that can interpret the weight matrices as a hierarchy of latent modes. These modes are akin to patterns in physics studies of memory networks. The mode decomposition learning not only saves a significant large amount of training costs, but also explains the network performance with the leading modes. The mode learning scheme shows a progressively compact latent space across the network hierarchy, and the least number of modes increases only logarithmically with the network width. Our mode decomposition learning is also studied in an analytic on-line learning setting, which reveals multi-stage of learning dynamics. Therefore, the proposed mode decomposition learning points to a cheap and interpretable route towards the magical deep learning.

Related papers

Deep Learning Through A Telescoping Lens: A Simple Model Provides Empirical Insights On Grokking, Gradient Boosting & Beyond [61.18736646013446]
In pursuit of a deeper understanding of its surprising behaviors, we investigate the utility of a simple yet accurate model of a trained neural network. Across three case studies, we illustrate how it can be applied to derive new empirical insights on a diverse range of prominent phenomena.
arXiv Detail & Related papers (2024-10-31T22:54:34Z)
Towards Scalable and Versatile Weight Space Learning [51.78426981947659]
This paper introduces the SANE approach to weight-space learning. Our method extends the idea of hyper-representations towards sequential processing of subsets of neural network weights.
arXiv Detail & Related papers (2024-06-14T13:12:07Z)
Get rich quick: exact solutions reveal how unbalanced initializations promote rapid feature learning [26.07501953088188]
We study how unbalanced layer-specific initialization variances and learning rates determine the degree of feature learning. Our analysis reveals that they conspire to influence the learning regime through a set of conserved quantities. We provide evidence that this unbalanced rich regime drives feature learning in deep finite-width networks, promotes interpretability of early layers in CNNs, reduces the sample complexity of learning hierarchical data, and decreases the time to grokking in modular arithmetic.
arXiv Detail & Related papers (2024-06-10T10:42:37Z)
Radial Networks: Dynamic Layer Routing for High-Performance Large Language Models [9.637088945386227]
Large language models (LLMs) often struggle with strict memory, latency, and power demands. Various forms of dynamic sparsity have been proposed that reduce compute on an input-by-input basis. We propose Radial Networks, which perform token-level routing between layers guided by a trained router module.
arXiv Detail & Related papers (2024-04-07T09:52:31Z)
Spiking mode-based neural networks [2.5690340428649328]
Spiking neural networks play an important role in brain-like neuromorphic computations and in studying working mechanisms of neural circuits. One drawback of training a large scale spiking neural network is that updating all weights is quite expensive. We propose a spiking mode-based training protocol, where the recurrent weight matrix is explained as a Hopfield-like multiplication of three matrices.
arXiv Detail & Related papers (2023-10-23T06:54:17Z)
Online Network Source Optimization with Graph-Kernel MAB [62.6067511147939]
We propose Grab-UCB, a graph- kernel multi-arms bandit algorithm to learn online the optimal source placement in large scale networks. We describe the network processes with an adaptive graph dictionary model, which typically leads to sparse spectral representations. We derive the performance guarantees that depend on network parameters, which further influence the learning curve of the sequential decision strategy.
arXiv Detail & Related papers (2023-07-07T15:03:42Z)
The Underlying Correlated Dynamics in Neural Training [6.385006149689549]
Training of neural networks is a computationally intensive task. We propose a model based on the correlation of the parameters' dynamics, which dramatically reduces the dimensionality. This representation enhances the understanding of the underlying training dynamics and can pave the way for designing better acceleration techniques.
arXiv Detail & Related papers (2022-12-18T08:34:11Z)
Adaptive Convolutional Dictionary Network for CT Metal Artifact Reduction [62.691996239590125]
We propose an adaptive convolutional dictionary network (ACDNet) for metal artifact reduction. Our ACDNet can automatically learn the prior for artifact-free CT images via training data and adaptively adjust the representation kernels for each input CT image. Our method inherits the clear interpretability of model-based methods and maintains the powerful representation ability of learning-based methods.
arXiv Detail & Related papers (2022-05-16T06:49:36Z)
Dimensionality Reduction in Deep Learning via Kronecker Multi-layer Architectures [4.836352379142503]
We propose a new deep learning architecture based on fast matrix multiplication of a Kronecker product decomposition. We show that this architecture allows a neural network to be trained and implemented with a significant reduction in computational time and resources.
arXiv Detail & Related papers (2022-04-08T19:54:52Z)
Incremental Training of a Recurrent Neural Network Exploiting a Multi-Scale Dynamic Memory [79.42778415729475]
We propose a novel incrementally trained recurrent architecture targeting explicitly multi-scale learning. We show how to extend the architecture of a simple RNN by separating its hidden state into different modules. We discuss a training algorithm where new modules are iteratively added to the model to learn progressively longer dependencies.
arXiv Detail & Related papers (2020-06-29T08:35:49Z)
The large learning rate phase of deep learning: the catapult mechanism [50.23041928811575]
We present a class of neural networks with solvable training dynamics. We find good agreement between our model's predictions and training dynamics in realistic deep learning settings. We believe our results shed light on characteristics of models trained at different learning rates.
arXiv Detail & Related papers (2020-03-04T17:52:48Z)

This list is automatically generated from the titles and abstracts of the papers in this site.