An Analysis of Alternating Direction Method of Multipliers for
Feed-forward Neural Networks
- URL: http://arxiv.org/abs/2009.02825v1
- Date: Sun, 6 Sep 2020 22:13:54 GMT
- Title: An Analysis of Alternating Direction Method of Multipliers for
Feed-forward Neural Networks
- Authors: Seyedeh Niusha Alavi Foumani, Ce Guo, Wayne Luk
- Abstract summary: The motive behind this approach was to conduct a method of training neural networks that is scalable and can be parallelised.
We have achieved 6.9% and 6.8% better accuracy comparing to SGD and Adam respectively, with a four-layer neural network with hidden size of 28.
- Score: 2.8747398859585376
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In this work, we present a hardware compatible neural network training
algorithm in which we used alternating direction method of multipliers (ADMM)
and iterative least-square methods. The motive behind this approach was to
conduct a method of training neural networks that is scalable and can be
parallelised. These characteristics make this algorithm suitable for hardware
implementation. We have achieved 6.9\% and 6.8\% better accuracy comparing to
SGD and Adam respectively, with a four-layer neural network with hidden size of
28 on HIGGS dataset. Likewise, we could observe 21.0\% and 2.2\% accuracy
improvement comparing to SGD and Adam respectively, on IRIS dataset with a
three-layer neural network with hidden size of 8. This is while the use of
matrix inversion, which is challenging for hardware implementation, is avoided
in this method. We assessed the impact of avoiding matrix inversion on ADMM
accuracy and we observed that we can safely replace matrix inversion with
iterative least-square methods and maintain the desired performance. Also, the
computational complexity of the implemented method is polynomial regarding
dimensions of the input dataset and hidden size of the network.
Related papers
- Two Sparse Matrices are Better than One: Sparsifying Neural Networks with Double Sparse Factorization [0.0]
We present Double Sparse Factorization (DSF), where we factorize each weight matrix into two sparse matrices.
Our method achieves state-of-the-art results, enabling unprecedented sparsification of neural networks.
arXiv Detail & Related papers (2024-09-27T15:48:39Z) - Heterogenous Memory Augmented Neural Networks [84.29338268789684]
We introduce a novel heterogeneous memory augmentation approach for neural networks.
By introducing learnable memory tokens with attention mechanism, we can effectively boost performance without huge computational overhead.
We show our approach on various image and graph-based tasks under both in-distribution (ID) and out-of-distribution (OOD) conditions.
arXiv Detail & Related papers (2023-10-17T01:05:28Z) - Optimization Guarantees of Unfolded ISTA and ADMM Networks With Smooth
Soft-Thresholding [57.71603937699949]
We study optimization guarantees, i.e., achieving near-zero training loss with the increase in the number of learning epochs.
We show that the threshold on the number of training samples increases with the increase in the network width.
arXiv Detail & Related papers (2023-09-12T13:03:47Z) - Layer-wise Adaptive Step-Sizes for Stochastic First-Order Methods for
Deep Learning [8.173034693197351]
We propose a new per-layer adaptive step-size procedure for first-order optimization methods in deep learning.
The proposed approach exploits the layer-wise curvature information contained in the diagonal blocks of the Hessian in deep neural networks (DNNs) to compute adaptive step-sizes (i.e., LRs) for each layer.
Numerical experiments show that SGD with momentum and AdamW combined with the proposed per-layer step-sizes are able to choose effective LR schedules.
arXiv Detail & Related papers (2023-05-23T04:12:55Z) - Scalable computation of prediction intervals for neural networks via
matrix sketching [79.44177623781043]
Existing algorithms for uncertainty estimation require modifying the model architecture and training procedure.
This work proposes a new algorithm that can be applied to a given trained neural network and produces approximate prediction intervals.
arXiv Detail & Related papers (2022-05-06T13:18:31Z) - Revisiting Recursive Least Squares for Training Deep Neural Networks [10.44340837533087]
Recursive least squares (RLS) algorithms were once widely used for training small-scale neural networks, due to their fast convergence.
Previous RLS algorithms are unsuitable for training deep neural networks (DNNs), since they have high computational complexity and too many preconditions.
We propose three novel RLS optimization algorithms for training feedforward neural networks, convolutional neural networks and recurrent neural networks.
arXiv Detail & Related papers (2021-09-07T17:43:51Z) - Adam in Private: Secure and Fast Training of Deep Neural Networks with
Adaptive Moment Estimation [6.342794803074475]
We propose a framework that allows efficient evaluation of full-fledged state-of-the-art machine learning algorithms.
This is in contrast to most prior works, which substitute ML algorithms with approximated "MPC-friendly" variants.
We obtain secure training that outperforms state-of-the-art three-party systems.
arXiv Detail & Related papers (2021-06-04T01:40:09Z) - Partitioning sparse deep neural networks for scalable training and
inference [8.282177703075453]
State-of-the-art deep neural networks (DNNs) have significant computational and data management requirements.
Sparsification and pruning methods are shown to be effective in removing a large fraction of connections in DNNs.
The resulting sparse networks present unique challenges to further improve the computational efficiency of training and inference in deep learning.
arXiv Detail & Related papers (2021-04-23T20:05:52Z) - Random Features for the Neural Tangent Kernel [57.132634274795066]
We propose an efficient feature map construction of the Neural Tangent Kernel (NTK) of fully-connected ReLU network.
We show that dimension of the resulting features is much smaller than other baseline feature map constructions to achieve comparable error bounds both in theory and practice.
arXiv Detail & Related papers (2021-04-03T09:08:12Z) - Exploiting Adam-like Optimization Algorithms to Improve the Performance
of Convolutional Neural Networks [82.61182037130405]
gradient descent (SGD) is the main approach for training deep networks.
In this work, we compare Adam based variants based on the difference between the present and the past gradients.
We have tested ensemble of networks and the fusion with ResNet50 trained with gradient descent.
arXiv Detail & Related papers (2021-03-26T18:55:08Z) - MOPS-Net: A Matrix Optimization-driven Network forTask-Oriented 3D Point
Cloud Downsampling [86.42733428762513]
MOPS-Net is a novel interpretable deep learning-based method for matrix optimization.
We show that MOPS-Net can achieve favorable performance against state-of-the-art deep learning-based methods over various tasks.
arXiv Detail & Related papers (2020-05-01T14:01:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.