Tourbillon: a Physically Plausible Neural Architecture
- URL: http://arxiv.org/abs/2107.06424v1
- Date: Tue, 13 Jul 2021 22:51:42 GMT
- Title: Tourbillon: a Physically Plausible Neural Architecture
- Authors: Mohammadamin Tavakoli, Pierre Baldi, Peter Sadowski
- Abstract summary: Tourbillon is a new architecture that addresses backpropagation limitations.
We show that Tourbillon can achieve comparable performance to models trained with backpropagation.
- Score: 8.7660229706359
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In a physical neural system, backpropagation is faced with a number of
obstacles including: the need for labeled data, the violation of the locality
learning principle, the need for symmetric connections, and the lack of
modularity. Tourbillon is a new architecture that addresses all these
limitations. At its core, it consists of a stack of circular autoencoders
followed by an output layer. The circular autoencoders are trained in
self-supervised mode by recirculation algorithms and the top layer in
supervised mode by stochastic gradient descent, with the option of propagating
error information through the entire stack using non-symmetric connections.
While the Tourbillon architecture is meant primarily to address physical
constraints, and not to improve current engineering applications of deep
learning, we demonstrate its viability on standard benchmark datasets including
MNIST, Fashion MNIST, and CIFAR10. We show that Tourbillon can achieve
comparable performance to models trained with backpropagation and outperform
models that are trained with other physically plausible algorithms, such as
feedback alignment.
Related papers
- Disentanglement via Latent Quantization [60.37109712033694]
In this work, we construct an inductive bias towards encoding to and decoding from an organized latent space.
We demonstrate the broad applicability of this approach by adding it to both basic data-re (vanilla autoencoder) and latent-reconstructing (InfoGAN) generative models.
arXiv Detail & Related papers (2023-05-28T06:30:29Z) - Git Re-Basin: Merging Models modulo Permutation Symmetries [3.5450828190071655]
We show how simple algorithms can be used to fit large networks in practice.
We demonstrate the first (to our knowledge) demonstration of zero mode connectivity between independently trained models.
We also discuss shortcomings in the linear mode connectivity hypothesis.
arXiv Detail & Related papers (2022-09-11T10:44:27Z) - Semi-Supervised Manifold Learning with Complexity Decoupled Chart Autoencoders [45.29194877564103]
This work introduces a chart autoencoder with an asymmetric encoding-decoding process that can incorporate additional semi-supervised information such as class labels.
We discuss the approximation power of such networks and derive a bound that essentially depends on the intrinsic dimension of the data manifold rather than the dimension of ambient space.
arXiv Detail & Related papers (2022-08-22T19:58:03Z) - Path Development Network with Finite-dimensional Lie Group Representation [3.9983665898166425]
We propose a novel, trainable path development layer, which exploits representations of sequential data through finite-dimensional Lie groups.
Our proposed layer, analogous to recurrent neural networks (RNN), possesses an explicit, simple recurrent unit that alleviates the gradient issues.
Empirical results on a range of datasets show that the development layer consistently and significantly outperforms signature features on accuracy and dimensionality.
arXiv Detail & Related papers (2022-04-02T02:01:00Z) - Decoupled Multi-task Learning with Cyclical Self-Regulation for Face
Parsing [71.19528222206088]
We propose a novel Decoupled Multi-task Learning with Cyclical Self-Regulation for face parsing.
Specifically, DML-CSR designs a multi-task model which comprises face parsing, binary edge, and category edge detection.
Our method achieves the new state-of-the-art performance on the Helen, CelebA-HQ, and LapaMask datasets.
arXiv Detail & Related papers (2022-03-28T02:12:30Z) - Dynamic Inference with Neural Interpreters [72.90231306252007]
We present Neural Interpreters, an architecture that factorizes inference in a self-attention network as a system of modules.
inputs to the model are routed through a sequence of functions in a way that is end-to-end learned.
We show that Neural Interpreters perform on par with the vision transformer using fewer parameters, while being transferrable to a new task in a sample efficient manner.
arXiv Detail & Related papers (2021-10-12T23:22:45Z) - Understanding Dynamics of Nonlinear Representation Learning and Its
Application [12.697842097171119]
We study the dynamics of implicit nonlinear representation learning.
We show that the data-architecture alignment condition is sufficient for the global convergence.
We derive a new training framework, which satisfies the data-architecture alignment condition without assuming it.
arXiv Detail & Related papers (2021-06-28T16:31:30Z) - GradInit: Learning to Initialize Neural Networks for Stable and
Efficient Training [59.160154997555956]
We present GradInit, an automated and architecture method for initializing neural networks.
It is based on a simple agnostic; the variance of each network layer is adjusted so that a single step of SGD or Adam results in the smallest possible loss value.
It also enables training the original Post-LN Transformer for machine translation without learning rate warmup.
arXiv Detail & Related papers (2021-02-16T11:45:35Z) - Edge Federated Learning Via Unit-Modulus Over-The-Air Computation
(Extended Version) [64.76619508293966]
This paper proposes a unit-modulus over-the-air computation (UM-AirComp) framework to facilitate efficient edge federated learning.
It uploads simultaneously local model parameters and updates global model parameters via analog beamforming.
We demonstrate the implementation of UM-AirComp in a vehicle-to-everything autonomous driving simulation platform.
arXiv Detail & Related papers (2021-01-28T15:10:22Z) - Learning to Encode Position for Transformer with Continuous Dynamical
Model [88.69870971415591]
We introduce a new way of learning to encode position information for non-recurrent models, such as Transformer models.
We model the evolution of encoded results along position index by such a dynamical system.
arXiv Detail & Related papers (2020-03-13T00:41:41Z) - Follow the Neurally-Perturbed Leader for Adversarial Training [0.0]
We propose a novel leader algorithm for zeros-sum training to mixed equilibrium without behaviors without perturbations.
We validate our theoretical results by applying this training algorithm to games with convex and non-perturbed loss as well as generative adversarial architectures.
We customize the implementation of this algorithm for adversarial imitation learning applications.
arXiv Detail & Related papers (2020-02-16T00:09:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.