Stochastic Block-ADMM for Training Deep Networks
- URL: http://arxiv.org/abs/2105.00339v1
- Date: Sat, 1 May 2021 19:56:13 GMT
- Title: Stochastic Block-ADMM for Training Deep Networks
- Authors: Saeed Khorram, Xiao Fu, Mohamad H. Danesh, Zhongang Qi, Li Fuxin
- Abstract summary: We propose Block-ADMM as an approach to train deep neural networks in batch and online settings.
Our method works by splitting neural networks into an arbitrary number of blocks and utilizing auxiliary variables to connect these blocks.
We prove the convergence of our proposed method and justify its capabilities through experiments in supervised and weakly-supervised settings.
- Score: 16.369102155752824
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: In this paper, we propose Stochastic Block-ADMM as an approach to train deep
neural networks in batch and online settings. Our method works by splitting
neural networks into an arbitrary number of blocks and utilizes auxiliary
variables to connect these blocks while optimizing with stochastic gradient
descent. This allows training deep networks with non-differentiable constraints
where conventional backpropagation is not applicable. An application of this is
supervised feature disentangling, where our proposed DeepFacto inserts a
non-negative matrix factorization (NMF) layer into the network. Since
backpropagation only needs to be performed within each block, our approach
alleviates vanishing gradients and provides potentials for parallelization. We
prove the convergence of our proposed method and justify its capabilities
through experiments in supervised and weakly-supervised settings.
Related papers
- Robust Stochastically-Descending Unrolled Networks [85.6993263983062]
Deep unrolling is an emerging learning-to-optimize method that unrolls a truncated iterative algorithm in the layers of a trainable neural network.
We show that convergence guarantees and generalizability of the unrolled networks are still open theoretical problems.
We numerically assess unrolled architectures trained under the proposed constraints in two different applications.
arXiv Detail & Related papers (2023-12-25T18:51:23Z) - Unlocking Deep Learning: A BP-Free Approach for Parallel Block-Wise
Training of Neural Networks [9.718519843862937]
We introduce a block-wise BP-free (BWBPF) neural network that leverages local error signals to optimize sub-neural networks separately.
Our experimental results consistently show that this approach can identify transferable decoupled architectures for VGG and ResNet variations.
arXiv Detail & Related papers (2023-12-20T08:02:33Z) - Optimization Guarantees of Unfolded ISTA and ADMM Networks With Smooth
Soft-Thresholding [57.71603937699949]
We study optimization guarantees, i.e., achieving near-zero training loss with the increase in the number of learning epochs.
We show that the threshold on the number of training samples increases with the increase in the network width.
arXiv Detail & Related papers (2023-09-12T13:03:47Z) - The Cascaded Forward Algorithm for Neural Network Training [61.06444586991505]
We propose a new learning framework for neural networks, namely Cascaded Forward (CaFo) algorithm, which does not rely on BP optimization as that in FF.
Unlike FF, our framework directly outputs label distributions at each cascaded block, which does not require generation of additional negative samples.
In our framework each block can be trained independently, so it can be easily deployed into parallel acceleration systems.
arXiv Detail & Related papers (2023-03-17T02:01:11Z) - Globally Optimal Training of Neural Networks with Threshold Activation
Functions [63.03759813952481]
We study weight decay regularized training problems of deep neural networks with threshold activations.
We derive a simplified convex optimization formulation when the dataset can be shattered at a certain layer of the network.
arXiv Detail & Related papers (2023-03-06T18:59:13Z) - Tunable Subnetwork Splitting for Model-parallelism of Neural Network
Training [12.755664985045582]
We propose a Tunable Subnetwork Splitting Method (TSSM) to tune the decomposition of deep neural networks.
Our proposed TSSM can achieve significant speedup without observable loss of training accuracy.
arXiv Detail & Related papers (2020-09-09T01:05:12Z) - ESPN: Extremely Sparse Pruned Networks [50.436905934791035]
We show that a simple iterative mask discovery method can achieve state-of-the-art compression of very deep networks.
Our algorithm represents a hybrid approach between single shot network pruning methods and Lottery-Ticket type approaches.
arXiv Detail & Related papers (2020-06-28T23:09:27Z) - Improving the Backpropagation Algorithm with Consequentialism Weight
Updates over Mini-Batches [0.40611352512781856]
We show that it is possible to consider a multi-layer neural network as a stack of adaptive filters.
We introduce a better algorithm by predicting then emending the adverse consequences of the actions that take place in BP even before they happen.
Our experiments show the usefulness of our algorithm in the training of deep neural networks.
arXiv Detail & Related papers (2020-03-11T08:45:36Z) - MSE-Optimal Neural Network Initialization via Layer Fusion [68.72356718879428]
Deep neural networks achieve state-of-the-art performance for a range of classification and inference tasks.
The use of gradient combined nonvolutionity renders learning susceptible to novel problems.
We propose fusing neighboring layers of deeper networks that are trained with random variables.
arXiv Detail & Related papers (2020-01-28T18:25:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.