Stacked unsupervised learning with a network architecture found by
supervised meta-learning
- URL: http://arxiv.org/abs/2206.02716v1
- Date: Mon, 6 Jun 2022 16:17:20 GMT
- Title: Stacked unsupervised learning with a network architecture found by
supervised meta-learning
- Authors: Kyle Luther and H. Sebastian Seung
- Abstract summary: Stacked unsupervised learning seems more biologically plausible than backpropagation.
But SUL has fallen far short of backpropagation in practical applications.
We show an SUL algorithm that can perform completely unsupervised clustering of MNIST digits.
- Score: 4.209801809583906
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Stacked unsupervised learning (SUL) seems more biologically plausible than
backpropagation, because learning is local to each layer. But SUL has fallen
far short of backpropagation in practical applications, undermining the idea
that SUL can explain how brains learn. Here we show an SUL algorithm that can
perform completely unsupervised clustering of MNIST digits with comparable
accuracy relative to unsupervised algorithms based on backpropagation. Our
algorithm is exceeded only by self-supervised methods requiring training data
augmentation by geometric distortions. The only prior knowledge in our
unsupervised algorithm is implicit in the network architecture. Multiple
convolutional "energy layers" contain a sum-of-squares nonlinearity, inspired
by "energy models" of primary visual cortex. Convolutional kernels are learned
with a fast minibatch implementation of the K-Subspaces algorithm. High
accuracy requires preprocessing with an initial whitening layer,
representations that are less sparse during inference than learning, and
rescaling for gain control. The hyperparameters of the network architecture are
found by supervised meta-learning, which optimizes unsupervised clustering
accuracy. We regard such dependence of unsupervised learning on prior knowledge
implicit in network architecture as biologically plausible, and analogous to
the dependence of brain architecture on evolutionary history.
Related papers
- A Unified Framework for Neural Computation and Learning Over Time [56.44910327178975]
Hamiltonian Learning is a novel unified framework for learning with neural networks "over time"
It is based on differential equations that: (i) can be integrated without the need of external software solvers; (ii) generalize the well-established notion of gradient-based learning in feed-forward and recurrent networks; (iii) open to novel perspectives.
arXiv Detail & Related papers (2024-09-18T14:57:13Z) - The Cascaded Forward Algorithm for Neural Network Training [61.06444586991505]
We propose a new learning framework for neural networks, namely Cascaded Forward (CaFo) algorithm, which does not rely on BP optimization as that in FF.
Unlike FF, our framework directly outputs label distributions at each cascaded block, which does not require generation of additional negative samples.
In our framework each block can be trained independently, so it can be easily deployed into parallel acceleration systems.
arXiv Detail & Related papers (2023-03-17T02:01:11Z) - Information Flow in Deep Neural Networks [0.6922389632860545]
There is no comprehensive theoretical understanding of how deep neural networks work or are structured.
Deep networks are often seen as black boxes with unclear interpretations and reliability.
This work aims to apply principles and techniques from information theory to deep learning models to increase our theoretical understanding and design better algorithms.
arXiv Detail & Related papers (2022-02-10T23:32:26Z) - Learning Structures for Deep Neural Networks [99.8331363309895]
We propose to adopt the efficient coding principle, rooted in information theory and developed in computational neuroscience.
We show that sparse coding can effectively maximize the entropy of the output signals.
Our experiments on a public image classification dataset demonstrate that using the structure learned from scratch by our proposed algorithm, one can achieve a classification accuracy comparable to the best expert-designed structure.
arXiv Detail & Related papers (2021-05-27T12:27:24Z) - Auto-tuning of Deep Neural Networks by Conflicting Layer Removal [0.0]
We introduce a novel methodology to identify layers that decrease the test accuracy of trained models.
Conflicting layers are detected as early as the beginning of training.
We will show that around 60% of the layers of trained residual networks can be completely removed from the architecture.
arXiv Detail & Related papers (2021-03-07T11:51:55Z) - Incremental Learning via Rate Reduction [26.323357617265163]
Current deep learning architectures suffer from catastrophic forgetting, a failure to retain knowledge of previously learned classes when incrementally trained on new classes.
We propose utilizing an alternative "white box" architecture derived from the principle of rate reduction, where each layer of the network is explicitly computed without back propagation.
Under this paradigm, we demonstrate that, given a pre-trained network and new data classes, our approach can provably construct a new network that emulates joint training with all past and new classes.
arXiv Detail & Related papers (2020-11-30T07:23:55Z) - MAP Propagation Algorithm: Faster Learning with a Team of Reinforcement
Learning Agents [0.0]
An alternative way of training an artificial neural network is through treating each unit in the network as a reinforcement learning agent.
We propose a novel algorithm called MAP propagation to reduce this variance significantly.
Our work thus allows for the broader application of the teams of agents in deep reinforcement learning.
arXiv Detail & Related papers (2020-10-15T17:17:39Z) - Understanding Self-supervised Learning with Dual Deep Networks [74.92916579635336]
We propose a novel framework to understand contrastive self-supervised learning (SSL) methods that employ dual pairs of deep ReLU networks.
We prove that in each SGD update of SimCLR with various loss functions, the weights at each layer are updated by a emphcovariance operator.
To further study what role the covariance operator plays and which features are learned in such a process, we model data generation and augmentation processes through a emphhierarchical latent tree model (HLTM)
arXiv Detail & Related papers (2020-10-01T17:51:49Z) - LoCo: Local Contrastive Representation Learning [93.98029899866866]
We show that by overlapping local blocks stacking on top of each other, we effectively increase the decoder depth and allow upper blocks to implicitly send feedbacks to lower blocks.
This simple design closes the performance gap between local learning and end-to-end contrastive learning algorithms for the first time.
arXiv Detail & Related papers (2020-08-04T05:41:29Z) - Communication-Efficient Distributed Stochastic AUC Maximization with
Deep Neural Networks [50.42141893913188]
We study a distributed variable for large-scale AUC for a neural network as with a deep neural network.
Our model requires a much less number of communication rounds and still a number of communication rounds in theory.
Our experiments on several datasets show the effectiveness of our theory and also confirm our theory.
arXiv Detail & Related papers (2020-05-05T18:08:23Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.