Coordinate descent on the orthogonal group for recurrent neural network
training
- URL: http://arxiv.org/abs/2108.00051v1
- Date: Fri, 30 Jul 2021 19:27:11 GMT
- Title: Coordinate descent on the orthogonal group for recurrent neural network
training
- Authors: Estelle Massart and Vinayak Abrol
- Abstract summary: We show that the algorithm rotates two columns of the recurrent matrix, an operation that can be efficiently implemented as a multiplication by a Givens matrix.
Experiments on a benchmark recurrent neural network training problem are presented to demonstrate the effectiveness of the proposed algorithm.
- Score: 9.886326127330337
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We propose to use stochastic Riemannian coordinate descent on the orthogonal
group for recurrent neural network training. The algorithm rotates successively
two columns of the recurrent matrix, an operation that can be efficiently
implemented as a multiplication by a Givens matrix. In the case when the
coordinate is selected uniformly at random at each iteration, we prove the
convergence of the proposed algorithm under standard assumptions on the loss
function, stepsize and minibatch noise. In addition, we numerically demonstrate
that the Riemannian gradient in recurrent neural network training has an
approximately sparse structure. Leveraging this observation, we propose a
faster variant of the proposed algorithm that relies on the Gauss-Southwell
rule. Experiments on a benchmark recurrent neural network training problem are
presented to demonstrate the effectiveness of the proposed algorithm.
Related papers
- Flavors of Margin: Implicit Bias of Steepest Descent in Homogeneous Neural Networks [19.185059111021854]
We study the implicit bias of the general family of steepest descent algorithms, which includes gradient descent, sign descent and coordinate descent.
We prove that an algorithm-dependent geometric margin starts increasing once the networks reach perfect training accuracy.
arXiv Detail & Related papers (2024-10-29T14:28:49Z) - Precise asymptotics of reweighted least-squares algorithms for linear diagonal networks [15.074950361970194]
We provide a unified analysis for a family of algorithms that encompasses IRLS, the recently proposed linlin-RFM algorithm, and the alternating diagonal neural networks.
We show that, with appropriately chosen reweighting policy, a handful of sparse structures can achieve favorable performance.
We also show that leveraging this in the reweighting scheme provably improves test error compared to coordinate-wise reweighting.
arXiv Detail & Related papers (2024-06-04T20:37:17Z) - Stochastic Unrolled Federated Learning [85.6993263983062]
We introduce UnRolled Federated learning (SURF), a method that expands algorithm unrolling to federated learning.
Our proposed method tackles two challenges of this expansion, namely the need to feed whole datasets to the unrolleds and the decentralized nature of federated learning.
arXiv Detail & Related papers (2023-05-24T17:26:22Z) - A Recursively Recurrent Neural Network (R2N2) Architecture for Learning
Iterative Algorithms [64.3064050603721]
We generalize Runge-Kutta neural network to a recurrent neural network (R2N2) superstructure for the design of customized iterative algorithms.
We demonstrate that regular training of the weight parameters inside the proposed superstructure on input/output data of various computational problem classes yields similar iterations to Krylov solvers for linear equation systems, Newton-Krylov solvers for nonlinear equation systems, and Runge-Kutta solvers for ordinary differential equations.
arXiv Detail & Related papers (2022-11-22T16:30:33Z) - An Inexact Augmented Lagrangian Algorithm for Training Leaky ReLU Neural
Network with Group Sparsity [13.27709100571336]
A leaky ReLU network with a group regularization term has been widely used in the recent years.
We show that there is a lack of approaches to compute a stationary point deterministically.
We propose an inexact augmented Lagrangian algorithm for solving the new model.
arXiv Detail & Related papers (2022-05-11T11:53:15Z) - Scalable computation of prediction intervals for neural networks via
matrix sketching [79.44177623781043]
Existing algorithms for uncertainty estimation require modifying the model architecture and training procedure.
This work proposes a new algorithm that can be applied to a given trained neural network and produces approximate prediction intervals.
arXiv Detail & Related papers (2022-05-06T13:18:31Z) - Revisiting Recursive Least Squares for Training Deep Neural Networks [10.44340837533087]
Recursive least squares (RLS) algorithms were once widely used for training small-scale neural networks, due to their fast convergence.
Previous RLS algorithms are unsuitable for training deep neural networks (DNNs), since they have high computational complexity and too many preconditions.
We propose three novel RLS optimization algorithms for training feedforward neural networks, convolutional neural networks and recurrent neural networks.
arXiv Detail & Related papers (2021-09-07T17:43:51Z) - LocalDrop: A Hybrid Regularization for Deep Neural Networks [98.30782118441158]
We propose a new approach for the regularization of neural networks by the local Rademacher complexity called LocalDrop.
A new regularization function for both fully-connected networks (FCNs) and convolutional neural networks (CNNs) has been developed based on the proposed upper bound of the local Rademacher complexity.
arXiv Detail & Related papers (2021-03-01T03:10:11Z) - Attentive Gaussian processes for probabilistic time-series generation [4.94950858749529]
We propose a computationally efficient attention-based network combined with the Gaussian process regression to generate real-valued sequence.
We develop a block-wise training algorithm to allow mini-batch training of the network while the GP is trained using full-batch.
The algorithm has been proved to converge and shows comparable, if not better, quality of the found solution.
arXiv Detail & Related papers (2021-02-10T01:19:15Z) - Communication-Efficient Distributed Stochastic AUC Maximization with
Deep Neural Networks [50.42141893913188]
We study a distributed variable for large-scale AUC for a neural network as with a deep neural network.
Our model requires a much less number of communication rounds and still a number of communication rounds in theory.
Our experiments on several datasets show the effectiveness of our theory and also confirm our theory.
arXiv Detail & Related papers (2020-05-05T18:08:23Z) - Parallelization Techniques for Verifying Neural Networks [52.917845265248744]
We introduce an algorithm based on the verification problem in an iterative manner and explore two partitioning strategies.
We also introduce a highly parallelizable pre-processing algorithm that uses the neuron activation phases to simplify the neural network verification problems.
arXiv Detail & Related papers (2020-04-17T20:21:47Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.