A Projection Algorithm for the Unitary Weights
- URL: http://arxiv.org/abs/2102.10052v1
- Date: Fri, 19 Feb 2021 17:33:17 GMT
- Title: A Projection Algorithm for the Unitary Weights
- Authors: Hao-Yuan Chang (University of California, Los Angeles)
- Abstract summary: Unitary neural networks are promising alternatives for solving the exploding and vanishing activation/gradient problem.
They often require longer training time due to the additional unitary constraints on their weight matrices.
Here we show a novel algorithm using a backpropagation technique with Lie algebra for computing approximated unitary weights from their pre-trained, non-unitary counterparts.
- Score: 0.0
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Unitary neural networks are promising alternatives for solving the exploding
and vanishing activation/gradient problem without the need for explicit
normalization that reduces the inference speed. However, they often require
longer training time due to the additional unitary constraints on their weight
matrices. Here we show a novel algorithm using a backpropagation technique with
Lie algebra for computing approximated unitary weights from their pre-trained,
non-unitary counterparts. The unitary networks initialized with these
approximations can reach the desired accuracies much faster, mitigating their
training time penalties while maintaining inference speedups. Our approach will
be instrumental in the adaptation of unitary networks, especially for those
neural architectures where pre-trained weights are freely available.
Related papers
- Simmering: Sufficient is better than optimal for training neural networks [0.0]
We introduce simmering, a physics-based method that trains neural networks to generate weights and biases that are merely good enough''
We show that simmering corrects neural networks that are overfit by Adam, and show that simmering avoids overfitting if deployed from the outset.
Our results question optimization as a paradigm for neural network training, and leverage information-geometric arguments to point to the existence of classes of sufficient training algorithms.
arXiv Detail & Related papers (2024-10-25T18:02:08Z) - Preconditioners for the Stochastic Training of Implicit Neural
Representations [30.92757082348805]
Implicit neural representations have emerged as a powerful technique for encoding complex continuous multidimensional signals as neural networks.
We propose training using diagonal preconditioners, showcasing their effectiveness across various signal modalities.
arXiv Detail & Related papers (2024-02-13T20:46:37Z) - Speed Limits for Deep Learning [67.69149326107103]
Recent advancement in thermodynamics allows bounding the speed at which one can go from the initial weight distribution to the final distribution of the fully trained network.
We provide analytical expressions for these speed limits for linear and linearizable neural networks.
Remarkably, given some plausible scaling assumptions on the NTK spectra and spectral decomposition of the labels -- learning is optimal in a scaling sense.
arXiv Detail & Related papers (2023-07-27T06:59:46Z) - Intelligence Processing Units Accelerate Neuromorphic Learning [52.952192990802345]
Spiking neural networks (SNNs) have achieved orders of magnitude improvement in terms of energy consumption and latency.
We present an IPU-optimized release of our custom SNN Python package, snnTorch.
arXiv Detail & Related papers (2022-11-19T15:44:08Z) - Low-rank lottery tickets: finding efficient low-rank neural networks via
matrix differential equations [2.3488056916440856]
We propose a novel algorithm to find efficient low-rankworks.
Theseworks are determined and adapted already during the training phase.
Our method automatically and dynamically adapts the ranks during training to achieve a desired approximation accuracy.
arXiv Detail & Related papers (2022-05-26T18:18:12Z) - LCS: Learning Compressible Subspaces for Adaptive Network Compression at
Inference Time [57.52251547365967]
We propose a method for training a "compressible subspace" of neural networks that contains a fine-grained spectrum of models.
We present results for achieving arbitrarily fine-grained accuracy-efficiency trade-offs at inference time for structured and unstructured sparsity.
Our algorithm extends to quantization at variable bit widths, achieving accuracy on par with individually trained networks.
arXiv Detail & Related papers (2021-10-08T17:03:34Z) - Fast semidefinite programming with feedforward neural networks [0.0]
We propose to solve feasibility semidefinite programs using artificial neural networks.
We train the network without having to exactly solve the semidefinite program even once.
We demonstrate that the trained neural network gives decent accuracy, while showing orders of magnitude increase in speed compared to a traditional solver.
arXiv Detail & Related papers (2020-11-11T14:01:34Z) - Training highly effective connectivities within neural networks with
randomly initialized, fixed weights [4.56877715768796]
We introduce a novel way of training a network by flipping the signs of the weights.
We obtain good results even with weights constant magnitude or even when weights are drawn from highly asymmetric distributions.
arXiv Detail & Related papers (2020-06-30T09:41:18Z) - Rapid Structural Pruning of Neural Networks with Set-based Task-Adaptive
Meta-Pruning [83.59005356327103]
A common limitation of most existing pruning techniques is that they require pre-training of the network at least once before pruning.
We propose STAMP, which task-adaptively prunes a network pretrained on a large reference dataset by generating a pruning mask on it as a function of the target dataset.
We validate STAMP against recent advanced pruning methods on benchmark datasets.
arXiv Detail & Related papers (2020-06-22T10:57:43Z) - AdaS: Adaptive Scheduling of Stochastic Gradients [50.80697760166045]
We introduce the notions of textit"knowledge gain" and textit"mapping condition" and propose a new algorithm called Adaptive Scheduling (AdaS)
Experimentation reveals that, using the derived metrics, AdaS exhibits: (a) faster convergence and superior generalization over existing adaptive learning methods; and (b) lack of dependence on a validation set to determine when to stop training.
arXiv Detail & Related papers (2020-06-11T16:36:31Z) - Distance-Based Regularisation of Deep Networks for Fine-Tuning [116.71288796019809]
We develop an algorithm that constrains a hypothesis class to a small sphere centred on the initial pre-trained weights.
Empirical evaluation shows that our algorithm works well, corroborating our theoretical results.
arXiv Detail & Related papers (2020-02-19T16:00:47Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.