Related papers: Training Neural Networks with Internal State, Unconstrained Connectivity, and Discrete Activations

Training Neural Networks with Internal State, Unconstrained Connectivity, and Discrete Activations

URL: http://arxiv.org/abs/2312.14359v1
Date: Fri, 22 Dec 2023 01:19:08 GMT
Title: Training Neural Networks with Internal State, Unconstrained Connectivity, and Discrete Activations
Authors: Alexander Grushin
Abstract summary: True intelligence may require the ability of a machine learning model to manage internal state. We show that we have not yet discovered the most effective algorithms for training such models. We present one attempt to design such a training algorithm, applied to an architecture with binary activations and only a single matrix of weights.
Score: 66.53734987585244
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Today's most powerful machine learning approaches are typically designed to train stateless architectures with predefined layers and differentiable activation functions. While these approaches have led to unprecedented successes in areas such as natural language processing and image recognition, the trained models are also susceptible to making mistakes that a human would not. In this paper, we take the view that true intelligence may require the ability of a machine learning model to manage internal state, but that we have not yet discovered the most effective algorithms for training such models. We further postulate that such algorithms might not necessarily be based on gradient descent over a deep architecture, but rather, might work best with an architecture that has discrete activations and few initial topological constraints (such as multiple predefined layers). We present one attempt in our ongoing efforts to design such a training algorithm, applied to an architecture with binary activations and only a single matrix of weights, and show that it is able to form useful representations of natural language text, but is also limited in its ability to leverage large quantities of training data. We then provide ideas for improving the algorithm and for designing other training algorithms for similar architectures. Finally, we discuss potential benefits that could be gained if an effective training algorithm is found, and suggest experiments for evaluating whether these benefits exist in practice.

Related papers

Task Agnostic Architecture for Algorithm Induction via Implicit Composition [10.627575117586417]
This position paper aims to explore developing such a unified architecture and proposes a theoretical framework of how it could be constructed. Recent Generative AI, especially Transformer-based models, demonstrate potential as an architecture capable of constructing algorithms for a wide range of domains. Our exploration delves into current capabilities and limitations of Transformer-based and other methods in efficient and correct algorithm composition.
arXiv Detail & Related papers (2024-04-03T04:31:09Z)
Towards a population-informed approach to the definition of data-driven models for structural dynamics [0.0]
A population-based scheme is followed here and two different machine-learning algorithms from the meta-learning domain are used. The algorithms seem to perform as intended and outperform a traditional machine-learning algorithm at approximating the quantities of interest.
arXiv Detail & Related papers (2023-07-19T09:45:41Z)
A Generalist Neural Algorithmic Learner [18.425083543441776]
We build a single graph neural network processor capable of learning to execute a wide range of algorithms. We show that it is possible to effectively learn algorithms in a multi-task manner, so long as we can learn to execute them well in a single-task regime.
arXiv Detail & Related papers (2022-09-22T16:41:33Z)
Stabilizing Q-learning with Linear Architectures for Provably Efficient Learning [53.17258888552998]
This work proposes an exploration variant of the basic $Q$-learning protocol with linear function approximation. We show that the performance of the algorithm degrades very gracefully under a novel and more permissive notion of approximation error.
arXiv Detail & Related papers (2022-06-01T23:26:51Z)
Gone Fishing: Neural Active Learning with Fisher Embeddings [55.08537975896764]
There is an increasing need for active learning algorithms that are compatible with deep neural networks. This article introduces BAIT, a practical representation of tractable, and high-performing active learning algorithm for neural networks.
arXiv Detail & Related papers (2021-06-17T17:26:31Z)
Joint Learning of Neural Transfer and Architecture Adaptation for Image Recognition [77.95361323613147]
Current state-of-the-art visual recognition systems rely on pretraining a neural network on a large-scale dataset and finetuning the network weights on a smaller dataset. In this work, we prove that dynamically adapting network architectures tailored for each domain task along with weight finetuning benefits in both efficiency and effectiveness. Our method can be easily generalized to an unsupervised paradigm by replacing supernet training with self-supervised learning in the source domain tasks and performing linear evaluation in the downstream tasks.
arXiv Detail & Related papers (2021-03-31T08:15:17Z)
Fast Object Segmentation Learning with Kernel-based Methods for Robotics [21.48920421574167]
Object segmentation is a key component in the visual system of a robot that performs tasks like grasping and object manipulation. We propose a novel architecture for object segmentation, that overcomes this problem and provides comparable performance in a fraction of the time required by the state-of-the-art methods. Our approach is validated on the YCB-Video dataset which is widely adopted in the computer vision and robotics community.
arXiv Detail & Related papers (2020-11-25T15:07:39Z)
Learned Greedy Method (LGM): A Novel Neural Architecture for Sparse Coding and Beyond [24.160276545294288]
We propose an unfolded version of a greedy pursuit algorithm for the same goal. Key features of our Learned Greedy Method (LGM) are the ability to accommodate a dynamic number of unfolded layers.
arXiv Detail & Related papers (2020-10-14T13:17:02Z)
Learning to Stop While Learning to Predict [85.7136203122784]
Many algorithm-inspired deep models are restricted to a fixed-depth'' for all inputs. Similar to algorithms, the optimal depth of a deep architecture may be different for different input instances. In this paper, we tackle this varying depth problem using a steerable architecture. We show that the learned deep model along with the stopping policy improves the performances on a diverse set of tasks.
arXiv Detail & Related papers (2020-06-09T07:22:01Z)
AutoML-Zero: Evolving Machine Learning Algorithms From Scratch [76.83052807776276]
We show that it is possible to automatically discover complete machine learning algorithms just using basic mathematical operations as building blocks. We demonstrate this by introducing a novel framework that significantly reduces human bias through a generic search space. We believe these preliminary successes in discovering machine learning algorithms from scratch indicate a promising new direction in the field.
arXiv Detail & Related papers (2020-03-06T19:00:04Z)

This list is automatically generated from the titles and abstracts of the papers in this site.