High-dimensional Neural Feature Design for Layer-wise Reduction of
Training Cost
- URL: http://arxiv.org/abs/2003.13058v2
- Date: Fri, 21 Aug 2020 21:16:00 GMT
- Title: High-dimensional Neural Feature Design for Layer-wise Reduction of
Training Cost
- Authors: Alireza M. Javid, Arun Venkitaraman, Mikael Skoglund, and Saikat
Chatterjee
- Abstract summary: We design a ReLU-based multilayer neural network by mapping the feature vectors to a higher dimensional space in every layer.
We show that the proposed architecture is norm-preserving and provides an invertible feature vector.
- Score: 47.34374677507664
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We design a ReLU-based multilayer neural network by mapping the feature
vectors to a higher dimensional space in every layer. We design the weight
matrices in every layer to ensure a reduction of the training cost as the
number of layers increases. Linear projection to the target in the higher
dimensional space leads to a lower training cost if a convex cost is minimized.
An $\ell_2$-norm convex constraint is used in the minimization to reduce the
generalization error and avoid overfitting. The regularization hyperparameters
of the network are derived analytically to guarantee a monotonic decrement of
the training cost, and therefore, it eliminates the need for cross-validation
to find the regularization hyperparameter in each layer. We show that the
proposed architecture is norm-preserving and provides an invertible feature
vector, and therefore, can be used to reduce the training cost of any other
learning method which employs linear projection to estimate the target.
Related papers
- Robust Stochastically-Descending Unrolled Networks [85.6993263983062]
Deep unrolling is an emerging learning-to-optimize method that unrolls a truncated iterative algorithm in the layers of a trainable neural network.
We show that convergence guarantees and generalizability of the unrolled networks are still open theoretical problems.
We numerically assess unrolled architectures trained under the proposed constraints in two different applications.
arXiv Detail & Related papers (2023-12-25T18:51:23Z) - ReLU Neural Networks with Linear Layers are Biased Towards Single- and Multi-Index Models [9.96121040675476]
This manuscript explores how properties of functions learned by neural networks of depth greater than two layers affect predictions.
Our framework considers a family of networks of varying depths that all have the same capacity but different representation costs.
arXiv Detail & Related papers (2023-05-24T22:10:12Z) - Iterative Soft Shrinkage Learning for Efficient Image Super-Resolution [91.3781512926942]
Image super-resolution (SR) has witnessed extensive neural network designs from CNN to transformer architectures.
This work investigates the potential of network pruning for super-resolution iteration to take advantage of off-the-shelf network designs and reduce the underlying computational overhead.
We propose a novel Iterative Soft Shrinkage-Percentage (ISS-P) method by optimizing the sparse structure of a randomly network at each and tweaking unimportant weights with a small amount proportional to the magnitude scale on-the-fly.
arXiv Detail & Related papers (2023-03-16T21:06:13Z) - Sketchy: Memory-efficient Adaptive Regularization with Frequent
Directions [22.09320263962004]
We find the spectra of the Kronecker-factored gradient covariance matrix in deep learning (DL) training tasks are concentrated on a small leading eigenspace.
We describe a generic method for reducing memory and compute requirements of maintaining a matrix preconditioner.
We show extensions of our work to Shampoo, resulting in a method competitive in quality with Shampoo and Adam, yet requiring only sub-linear memory for tracking second moments.
arXiv Detail & Related papers (2023-02-07T21:50:06Z) - Smoothed Online Convex Optimization Based on Discounted-Normal-Predictor [68.17855675511602]
We investigate an online prediction strategy named as Discounted-Normal-Predictor (Kapralov and Panigrahy, 2010) for smoothed online convex optimization (SOCO)
We show that the proposed algorithm can minimize the adaptive regret with switching cost in every interval.
arXiv Detail & Related papers (2022-05-02T08:48:22Z) - Error-Correcting Neural Networks for Two-Dimensional Curvature
Computation in the Level-Set Method [0.0]
We present an error-neural-modeling-based strategy for approximating two-dimensional curvature in the level-set method.
Our main contribution is a redesigned hybrid solver that relies on numerical schemes to enable machine-learning operations on demand.
arXiv Detail & Related papers (2022-01-22T05:14:40Z) - Subquadratic Overparameterization for Shallow Neural Networks [60.721751363271146]
We provide an analytical framework that allows us to adopt standard neural training strategies.
We achieve the desiderata viaak-Lojasiewicz, smoothness, and standard assumptions.
arXiv Detail & Related papers (2021-11-02T20:24:01Z) - Training Over-parameterized Models with Non-decomposable Objectives [46.62273918807789]
We propose new cost-sensitive losses that extend the classical idea of logit adjustment to handle more general cost matrices.
Our losses are calibrated, and can be further improved with distilled labels from a teacher model.
arXiv Detail & Related papers (2021-07-09T19:29:33Z) - A block coordinate descent optimizer for classification problems
exploiting convexity [0.0]
We introduce a coordinate descent method to deep linear networks for classification tasks that exploits convexity of the cross-entropy loss in the weights of the hidden layer.
By alternating between a second-order method to find globally optimal parameters for the linear layer and gradient descent to the hidden layers, we ensure an optimal fit of the adaptive basis to data throughout training.
arXiv Detail & Related papers (2020-06-17T19:49:06Z) - Sparse aNETT for Solving Inverse Problems with Deep Learning [2.5234156040689237]
We propose a sparse reconstruction framework (aNETT) for solving inverse problems.
We train an autoencoder network $D circ E$ with $E$ acting as a nonlinear sparsifying transform.
Numerical results are presented for sparse view CT.
arXiv Detail & Related papers (2020-04-20T18:43:13Z) - On the Global Convergence of Training Deep Linear ResNets [104.76256863926629]
We study the convergence of gradient descent (GD) and gradient descent (SGD) for training $L$-hidden-layer linear residual networks (ResNets)
We prove that for training deep residual networks with certain linear transformations at input and output layers, both GD and SGD can converge to the global minimum of the training loss.
arXiv Detail & Related papers (2020-03-02T18:34:49Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.