Unsupervised Learning of Initialization in Deep Neural Networks via
Maximum Mean Discrepancy
- URL: http://arxiv.org/abs/2302.04369v1
- Date: Wed, 8 Feb 2023 23:23:28 GMT
- Title: Unsupervised Learning of Initialization in Deep Neural Networks via
Maximum Mean Discrepancy
- Authors: Cheolhyoung Lee, Kyunghyun Cho
- Abstract summary: We propose an unsupervised algorithm to find good initialization for input data.
We first notice that each parameter configuration in the parameter space corresponds to one particular downstream task of d-way classification.
We then conjecture that the success of learning is directly related to how diverse downstream tasks are in the vicinity of the initial parameters.
- Score: 74.34895342081407
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Despite the recent success of stochastic gradient descent in deep learning,
it is often difficult to train a deep neural network with an inappropriate
choice of its initial parameters. Even if training is successful, it has been
known that the initial parameter configuration may negatively impact
generalization. In this paper, we propose an unsupervised algorithm to find
good initialization for input data, given that a downstream task is d-way
classification. We first notice that each parameter configuration in the
parameter space corresponds to one particular downstream task of d-way
classification. We then conjecture that the success of learning is directly
related to how diverse downstream tasks are in the vicinity of the initial
parameters. We thus design an algorithm that encourages small perturbation to
the initial parameter configuration leads to a diverse set of d-way
classification tasks. In other words, the proposed algorithm ensures a solution
to any downstream task to be near the initial parameter configuration. We
empirically evaluate the proposed algorithm on various tasks derived from MNIST
with a fully connected network. In these experiments, we observe that our
algorithm improves average test accuracy across most of these tasks, and that
such improvement is greater when the number of labelled examples is small.
Related papers
- Unrolled denoising networks provably learn optimal Bayesian inference [54.79172096306631]
We prove the first rigorous learning guarantees for neural networks based on unrolling approximate message passing (AMP)
For compressed sensing, we prove that when trained on data drawn from a product prior, the layers of the network converge to the same denoisers used in Bayes AMP.
arXiv Detail & Related papers (2024-09-19T17:56:16Z) - Rethinking PGD Attack: Is Sign Function Necessary? [131.6894310945647]
We present a theoretical analysis of how such sign-based update algorithm influences step-wise attack performance.
We propose a new raw gradient descent (RGD) algorithm that eliminates the use of sign.
The effectiveness of the proposed RGD algorithm has been demonstrated extensively in experiments.
arXiv Detail & Related papers (2023-12-03T02:26:58Z) - Towards Better Out-of-Distribution Generalization of Neural Algorithmic
Reasoning Tasks [51.8723187709964]
We study the OOD generalization of neural algorithmic reasoning tasks.
The goal is to learn an algorithm from input-output pairs using deep neural networks.
arXiv Detail & Related papers (2022-11-01T18:33:20Z) - One-Pass Learning via Bridging Orthogonal Gradient Descent and Recursive
Least-Squares [8.443742714362521]
We develop an algorithm for one-pass learning which seeks to perfectly fit every new datapoint while changing the parameters in a direction that causes the least change to the predictions on previous datapoints.
Our algorithm uses the memory efficiently by exploiting the structure of the streaming data via an incremental principal component analysis (IPCA)
Our experiments show the effectiveness of the proposed method compared to the baselines.
arXiv Detail & Related papers (2022-07-28T02:01:31Z) - Robust Learning of Parsimonious Deep Neural Networks [0.0]
We propose a simultaneous learning and pruning algorithm capable of identifying and eliminating irrelevant structures in a neural network.
We derive a novel hyper-prior distribution over the prior parameters that is crucial for their optimal selection.
We evaluate the proposed algorithm on the MNIST data set and commonly used fully connected and convolutional LeNet architectures.
arXiv Detail & Related papers (2022-05-10T03:38:55Z) - On the Explicit Role of Initialization on the Convergence and Implicit
Bias of Overparametrized Linear Networks [1.0323063834827415]
We present a novel analysis of single-hidden-layer linear networks trained under gradient flow.
We show that the squared loss converges exponentially to its optimum.
We derive a novel non-asymptotic upper-bound on the distance between the trained network and the min-norm solution.
arXiv Detail & Related papers (2021-05-13T15:13:51Z) - Data-driven Weight Initialization with Sylvester Solvers [72.11163104763071]
We propose a data-driven scheme to initialize the parameters of a deep neural network.
We show that our proposed method is especially effective in few-shot and fine-tuning settings.
arXiv Detail & Related papers (2021-05-02T07:33:16Z) - MSE-Optimal Neural Network Initialization via Layer Fusion [68.72356718879428]
Deep neural networks achieve state-of-the-art performance for a range of classification and inference tasks.
The use of gradient combined nonvolutionity renders learning susceptible to novel problems.
We propose fusing neighboring layers of deeper networks that are trained with random variables.
arXiv Detail & Related papers (2020-01-28T18:25:15Z) - Resolving learning rates adaptively by locating Stochastic Non-Negative
Associated Gradient Projection Points using line searches [0.0]
Learning rates in neural network training are currently determined a priori to training using expensive manual or automated tuning.
This study proposes gradient-only line searches to resolve the learning rate for neural network training algorithms.
arXiv Detail & Related papers (2020-01-15T03:08:07Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.