Topological Gradient-based Competitive Learning
- URL: http://arxiv.org/abs/2008.09477v1
- Date: Fri, 21 Aug 2020 13:44:38 GMT
- Title: Topological Gradient-based Competitive Learning
- Authors: Pietro Barbiero, Gabriele Ciravegna, Vincenzo Randazzo, Giansalvo
Cirrincione
- Abstract summary: This work is to present a novel comprehensive theory aspiring at bridging competitive learning with gradient-based learning.
We fully demonstrate the theoretical equivalence of two novel gradient-based competitive layers.
Preliminary experiments show how the dual approach, trained on the transpose of the input matrix, lead to faster convergence rate and higher training accuracy both in low and high-dimensional scenarios.
- Score: 1.6752712949948443
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Topological learning is a wide research area aiming at uncovering the mutual
spatial relationships between the elements of a set. Some of the most common
and oldest approaches involve the use of unsupervised competitive neural
networks. However, these methods are not based on gradient optimization which
has been proven to provide striking results in feature extraction also in
unsupervised learning. Unfortunately, by focusing mostly on algorithmic
efficiency and accuracy, deep clustering techniques are composed of overly
complex feature extractors, while using trivial algorithms in their top layer.
The aim of this work is to present a novel comprehensive theory aspiring at
bridging competitive learning with gradient-based learning, thus allowing the
use of extremely powerful deep neural networks for feature extraction and
projection combined with the remarkable flexibility and expressiveness of
competitive learning. In this paper we fully demonstrate the theoretical
equivalence of two novel gradient-based competitive layers. Preliminary
experiments show how the dual approach, trained on the transpose of the input
matrix i.e. $X^T$, lead to faster convergence rate and higher training accuracy
both in low and high-dimensional scenarios.
Related papers
- Active Learning of Deep Neural Networks via Gradient-Free Cutting Planes [40.68266398473983]
In this work, we investigate an active learning scheme via a novel cutting-plane method for ReLULU networks of arbitrary depth.
We demonstrate that these algorithms can be extended to deep neural networks despite their non-linear convergence.
We exemplify the effectiveness of our proposed active learning method against popular deep active learning baselines via both data experiments and classification on real datasets.
arXiv Detail & Related papers (2024-10-03T02:11:35Z) - Stochastic Unrolled Federated Learning [85.6993263983062]
We introduce UnRolled Federated learning (SURF), a method that expands algorithm unrolling to federated learning.
Our proposed method tackles two challenges of this expansion, namely the need to feed whole datasets to the unrolleds and the decentralized nature of federated learning.
arXiv Detail & Related papers (2023-05-24T17:26:22Z) - The Cascaded Forward Algorithm for Neural Network Training [61.06444586991505]
We propose a new learning framework for neural networks, namely Cascaded Forward (CaFo) algorithm, which does not rely on BP optimization as that in FF.
Unlike FF, our framework directly outputs label distributions at each cascaded block, which does not require generation of additional negative samples.
In our framework each block can be trained independently, so it can be easily deployed into parallel acceleration systems.
arXiv Detail & Related papers (2023-03-17T02:01:11Z) - On the Convergence of Distributed Stochastic Bilevel Optimization
Algorithms over a Network [55.56019538079826]
Bilevel optimization has been applied to a wide variety of machine learning models.
Most existing algorithms restrict their single-machine setting so that they are incapable of handling distributed data.
We develop novel decentralized bilevel optimization algorithms based on a gradient tracking communication mechanism and two different gradients.
arXiv Detail & Related papers (2022-06-30T05:29:52Z) - Optimization-Based Separations for Neural Networks [57.875347246373956]
We show that gradient descent can efficiently learn ball indicator functions using a depth 2 neural network with two layers of sigmoidal activations.
This is the first optimization-based separation result where the approximation benefits of the stronger architecture provably manifest in practice.
arXiv Detail & Related papers (2021-12-04T18:07:47Z) - Analytically Tractable Inference in Deep Neural Networks [0.0]
Tractable Approximate Inference (TAGI) algorithm was shown to be a viable and scalable alternative to backpropagation for shallow fully-connected neural networks.
We are demonstrating how TAGI matches or exceeds the performance of backpropagation, for training classic deep neural network architectures.
arXiv Detail & Related papers (2021-03-09T14:51:34Z) - Gradient-based Competitive Learning: Theory [1.6752712949948443]
This paper introduces a novel perspective in this area by combining gradient-based and competitive learning.
The theory is based on the intuition that neural networks are able to learn topological structures by working directly on the transpose of the input matrix.
The proposed approach has a great potential as it can be generalized to a vast selection of topological learning tasks.
arXiv Detail & Related papers (2020-09-06T19:00:51Z) - Parallelization Techniques for Verifying Neural Networks [52.917845265248744]
We introduce an algorithm based on the verification problem in an iterative manner and explore two partitioning strategies.
We also introduce a highly parallelizable pre-processing algorithm that uses the neuron activation phases to simplify the neural network verification problems.
arXiv Detail & Related papers (2020-04-17T20:21:47Z) - Large Batch Training Does Not Need Warmup [111.07680619360528]
Training deep neural networks using a large batch size has shown promising results and benefits many real-world applications.
In this paper, we propose a novel Complete Layer-wise Adaptive Rate Scaling (CLARS) algorithm for large-batch training.
Based on our analysis, we bridge the gap and illustrate the theoretical insights for three popular large-batch training techniques.
arXiv Detail & Related papers (2020-02-04T23:03:12Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.