Automated Sizing and Training of Efficient Deep Autoencoders using
Second Order Algorithms
- URL: http://arxiv.org/abs/2308.06221v2
- Date: Wed, 13 Dec 2023 13:46:12 GMT
- Title: Automated Sizing and Training of Efficient Deep Autoencoders using
Second Order Algorithms
- Authors: Kanishka Tyagi, Chinmay Rane, Michael Manry
- Abstract summary: We propose a multi-step training method for generalized linear classifiers.
validation error is minimized by pruning of unnecessary inputs.
desired outputs are improved via a method similar to the Ho-Kashyap rule.
- Score: 0.46040036610482665
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We propose a multi-step training method for designing generalized linear
classifiers. First, an initial multi-class linear classifier is found through
regression. Then validation error is minimized by pruning of unnecessary
inputs. Simultaneously, desired outputs are improved via a method similar to
the Ho-Kashyap rule. Next, the output discriminants are scaled to be net
functions of sigmoidal output units in a generalized linear classifier. We then
develop a family of batch training algorithm for the multi layer perceptron
that optimizes its hidden layer size and number of training epochs. Next, we
combine pruning with a growing approach. Later, the input units are scaled to
be the net function of the sigmoidal output units that are then feed into as
input to the MLP. We then propose resulting improvements in each of the deep
learning blocks thereby improving the overall performance of the deep
architecture. We discuss the principles and formulation regarding learning
algorithms for deep autoencoders. We investigate several problems in deep
autoencoders networks including training issues, the theoretical, mathematical
and experimental justification that the networks are linear, optimizing the
number of hidden units in each layer and determining the depth of the deep
learning model. A direct implication of the current work is the ability to
construct fast deep learning models using desktop level computational
resources. This, in our opinion, promotes our design philosophy of building
small but powerful algorithms. Performance gains are demonstrated at each step.
Using widely available datasets, the final network's ten fold testing error is
shown to be less than that of several other linear, generalized linear
classifiers, multi layer perceptron and deep learners reported in the
literature.
Related papers
- Distributive Pre-Training of Generative Modeling Using Matrix-Product
States [0.0]
We consider an alternative training scheme utilizing basic tensor network operations, e.g., summation and compression.
The training algorithm is based on compressing the superposition state constructed from all the training data in product state representation.
We benchmark the algorithm on the MNIST dataset and show reasonable results for generating new images and classification tasks.
arXiv Detail & Related papers (2023-06-26T15:46:08Z) - Dynamic Perceiver for Efficient Visual Recognition [87.08210214417309]
We propose Dynamic Perceiver (Dyn-Perceiver) to decouple the feature extraction procedure and the early classification task.
A feature branch serves to extract image features, while a classification branch processes a latent code assigned for classification tasks.
Early exits are placed exclusively within the classification branch, thus eliminating the need for linear separability in low-level features.
arXiv Detail & Related papers (2023-06-20T03:00:22Z) - Unsupervised Learning of Initialization in Deep Neural Networks via
Maximum Mean Discrepancy [74.34895342081407]
We propose an unsupervised algorithm to find good initialization for input data.
We first notice that each parameter configuration in the parameter space corresponds to one particular downstream task of d-way classification.
We then conjecture that the success of learning is directly related to how diverse downstream tasks are in the vicinity of the initial parameters.
arXiv Detail & Related papers (2023-02-08T23:23:28Z) - Fundamental Limits of Two-layer Autoencoders, and Achieving Them with
Gradient Methods [91.54785981649228]
This paper focuses on non-linear two-layer autoencoders trained in the challenging proportional regime.
Our results characterize the minimizers of the population risk, and show that such minimizers are achieved by gradient methods.
For the special case of a sign activation function, our analysis establishes the fundamental limits for the lossy compression of Gaussian sources via (shallow) autoencoders.
arXiv Detail & Related papers (2022-12-27T12:37:34Z) - One-Pass Learning via Bridging Orthogonal Gradient Descent and Recursive
Least-Squares [8.443742714362521]
We develop an algorithm for one-pass learning which seeks to perfectly fit every new datapoint while changing the parameters in a direction that causes the least change to the predictions on previous datapoints.
Our algorithm uses the memory efficiently by exploiting the structure of the streaming data via an incremental principal component analysis (IPCA)
Our experiments show the effectiveness of the proposed method compared to the baselines.
arXiv Detail & Related papers (2022-07-28T02:01:31Z) - Algorithms for Efficiently Learning Low-Rank Neural Networks [12.916132936159713]
We study algorithms for learning low-rank neural networks.
We present a provably efficient algorithm which learns an optimal low-rank approximation to a single-hidden-layer ReLU network.
We propose a novel low-rank framework for training low-rank $textitdeep$ networks.
arXiv Detail & Related papers (2022-02-02T01:08:29Z) - Generalized Learning Vector Quantization for Classification in
Randomized Neural Networks and Hyperdimensional Computing [4.4886210896619945]
We propose a modified RVFL network that avoids computationally expensive matrix operations during training.
The proposed approach achieved state-of-the-art accuracy on a collection of datasets from the UCI Machine Learning Repository.
arXiv Detail & Related papers (2021-06-17T21:17:17Z) - Solving Sparse Linear Inverse Problems in Communication Systems: A Deep
Learning Approach With Adaptive Depth [51.40441097625201]
We propose an end-to-end trainable deep learning architecture for sparse signal recovery problems.
The proposed method learns how many layers to execute to emit an output, and the network depth is dynamically adjusted for each task in the inference phase.
arXiv Detail & Related papers (2020-10-29T06:32:53Z) - An Effective and Efficient Initialization Scheme for Training
Multi-layer Feedforward Neural Networks [5.161531917413708]
We propose a novel network initialization scheme based on the celebrated Stein's identity.
A proposed SteinGLM method is shown through extensive numerical results to be much faster and more accurate than other popular methods commonly used for training neural networks.
arXiv Detail & Related papers (2020-05-16T16:17:37Z) - Communication-Efficient Distributed Stochastic AUC Maximization with
Deep Neural Networks [50.42141893913188]
We study a distributed variable for large-scale AUC for a neural network as with a deep neural network.
Our model requires a much less number of communication rounds and still a number of communication rounds in theory.
Our experiments on several datasets show the effectiveness of our theory and also confirm our theory.
arXiv Detail & Related papers (2020-05-05T18:08:23Z) - Learning Low-rank Deep Neural Networks via Singular Vector Orthogonality
Regularization and Singular Value Sparsification [53.50708351813565]
We propose SVD training, the first method to explicitly achieve low-rank DNNs during training without applying SVD on every step.
We empirically show that SVD training can significantly reduce the rank of DNN layers and achieve higher reduction on computation load under the same accuracy.
arXiv Detail & Related papers (2020-04-20T02:40:43Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.