Related papers: Tourbillon: a Physically Plausible Neural Architecture

Tourbillon: a Physically Plausible Neural Architecture

URL: http://arxiv.org/abs/2107.06424v1
Date: Tue, 13 Jul 2021 22:51:42 GMT
Title: Tourbillon: a Physically Plausible Neural Architecture
Authors: Mohammadamin Tavakoli, Pierre Baldi, Peter Sadowski
Abstract summary: Tourbillon is a new architecture that addresses backpropagation limitations. We show that Tourbillon can achieve comparable performance to models trained with backpropagation.
Score: 8.7660229706359
License: http://creativecommons.org/licenses/by/4.0/
Abstract: In a physical neural system, backpropagation is faced with a number of obstacles including: the need for labeled data, the violation of the locality learning principle, the need for symmetric connections, and the lack of modularity. Tourbillon is a new architecture that addresses all these limitations. At its core, it consists of a stack of circular autoencoders followed by an output layer. The circular autoencoders are trained in self-supervised mode by recirculation algorithms and the top layer in supervised mode by stochastic gradient descent, with the option of propagating error information through the entire stack using non-symmetric connections. While the Tourbillon architecture is meant primarily to address physical constraints, and not to improve current engineering applications of deep learning, we demonstrate its viability on standard benchmark datasets including MNIST, Fashion MNIST, and CIFAR10. We show that Tourbillon can achieve comparable performance to models trained with backpropagation and outperform models that are trained with other physically plausible algorithms, such as feedback alignment.

Related papers

On the Ability of Deep Networks to Learn Symmetries from Data: A Neural Kernel Theory [0.0]
In this work, we aim to understand when and how deep networks learn symmetries from data.<n>Inspired by real-world scenarios, we study a classification paradigm where data symmetries are only partially observed during training.<n>In the infinite-width limit, where kernel analogies apply, we derive a neural kernel theory of symmetry learning.
arXiv Detail & Related papers (2024-12-16T07:56:54Z)
Disentanglement via Latent Quantization [60.37109712033694]
In this work, we construct an inductive bias towards encoding to and decoding from an organized latent space. We demonstrate the broad applicability of this approach by adding it to both basic data-re (vanilla autoencoder) and latent-reconstructing (InfoGAN) generative models.
arXiv Detail & Related papers (2023-05-28T06:30:29Z)
Git Re-Basin: Merging Models modulo Permutation Symmetries [3.5450828190071655]
We show how simple algorithms can be used to fit large networks in practice. We demonstrate the first (to our knowledge) demonstration of zero mode connectivity between independently trained models. We also discuss shortcomings in the linear mode connectivity hypothesis.
arXiv Detail & Related papers (2022-09-11T10:44:27Z)
Semi-Supervised Manifold Learning with Complexity Decoupled Chart Autoencoders [45.29194877564103]
This work introduces a chart autoencoder with an asymmetric encoding-decoding process that can incorporate additional semi-supervised information such as class labels. We discuss the approximation power of such networks and derive a bound that essentially depends on the intrinsic dimension of the data manifold rather than the dimension of ambient space.
arXiv Detail & Related papers (2022-08-22T19:58:03Z)
Path Development Network with Finite-dimensional Lie Group Representation [3.9983665898166425]
We propose a novel, trainable path development layer, which exploits representations of sequential data through finite-dimensional Lie groups. Our proposed layer, analogous to recurrent neural networks (RNN), possesses an explicit, simple recurrent unit that alleviates the gradient issues. Empirical results on a range of datasets show that the development layer consistently and significantly outperforms signature features on accuracy and dimensionality.
arXiv Detail & Related papers (2022-04-02T02:01:00Z)
Decoupled Multi-task Learning with Cyclical Self-Regulation for Face Parsing [71.19528222206088]
We propose a novel Decoupled Multi-task Learning with Cyclical Self-Regulation for face parsing. Specifically, DML-CSR designs a multi-task model which comprises face parsing, binary edge, and category edge detection. Our method achieves the new state-of-the-art performance on the Helen, CelebA-HQ, and LapaMask datasets.
arXiv Detail & Related papers (2022-03-28T02:12:30Z)
Dynamic Inference with Neural Interpreters [72.90231306252007]
We present Neural Interpreters, an architecture that factorizes inference in a self-attention network as a system of modules. inputs to the model are routed through a sequence of functions in a way that is end-to-end learned. We show that Neural Interpreters perform on par with the vision transformer using fewer parameters, while being transferrable to a new task in a sample efficient manner.
arXiv Detail & Related papers (2021-10-12T23:22:45Z)
Understanding Dynamics of Nonlinear Representation Learning and Its Application [12.697842097171119]
We study the dynamics of implicit nonlinear representation learning. We show that the data-architecture alignment condition is sufficient for the global convergence. We derive a new training framework, which satisfies the data-architecture alignment condition without assuming it.
arXiv Detail & Related papers (2021-06-28T16:31:30Z)
GradInit: Learning to Initialize Neural Networks for Stable and Efficient Training [59.160154997555956]
We present GradInit, an automated and architecture method for initializing neural networks. It is based on a simple agnostic; the variance of each network layer is adjusted so that a single step of SGD or Adam results in the smallest possible loss value. It also enables training the original Post-LN Transformer for machine translation without learning rate warmup.
arXiv Detail & Related papers (2021-02-16T11:45:35Z)
Edge Federated Learning Via Unit-Modulus Over-The-Air Computation (Extended Version) [64.76619508293966]
This paper proposes a unit-modulus over-the-air computation (UM-AirComp) framework to facilitate efficient edge federated learning. It uploads simultaneously local model parameters and updates global model parameters via analog beamforming. We demonstrate the implementation of UM-AirComp in a vehicle-to-everything autonomous driving simulation platform.
arXiv Detail & Related papers (2021-01-28T15:10:22Z)
Learning to Encode Position for Transformer with Continuous Dynamical Model [88.69870971415591]
We introduce a new way of learning to encode position information for non-recurrent models, such as Transformer models. We model the evolution of encoded results along position index by such a dynamical system.
arXiv Detail & Related papers (2020-03-13T00:41:41Z)
Follow the Neurally-Perturbed Leader for Adversarial Training [0.0]
We propose a novel leader algorithm for zeros-sum training to mixed equilibrium without behaviors without perturbations. We validate our theoretical results by applying this training algorithm to games with convex and non-perturbed loss as well as generative adversarial architectures. We customize the implementation of this algorithm for adversarial imitation learning applications.
arXiv Detail & Related papers (2020-02-16T00:09:02Z)

This list is automatically generated from the titles and abstracts of the papers in this site.