Training Neural Networks with Internal State, Unconstrained
Connectivity, and Discrete Activations
- URL: http://arxiv.org/abs/2312.14359v1
- Date: Fri, 22 Dec 2023 01:19:08 GMT
- Title: Training Neural Networks with Internal State, Unconstrained
Connectivity, and Discrete Activations
- Authors: Alexander Grushin
- Abstract summary: True intelligence may require the ability of a machine learning model to manage internal state.
We show that we have not yet discovered the most effective algorithms for training such models.
We present one attempt to design such a training algorithm, applied to an architecture with binary activations and only a single matrix of weights.
- Score: 66.53734987585244
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Today's most powerful machine learning approaches are typically designed to
train stateless architectures with predefined layers and differentiable
activation functions. While these approaches have led to unprecedented
successes in areas such as natural language processing and image recognition,
the trained models are also susceptible to making mistakes that a human would
not. In this paper, we take the view that true intelligence may require the
ability of a machine learning model to manage internal state, but that we have
not yet discovered the most effective algorithms for training such models. We
further postulate that such algorithms might not necessarily be based on
gradient descent over a deep architecture, but rather, might work best with an
architecture that has discrete activations and few initial topological
constraints (such as multiple predefined layers). We present one attempt in our
ongoing efforts to design such a training algorithm, applied to an architecture
with binary activations and only a single matrix of weights, and show that it
is able to form useful representations of natural language text, but is also
limited in its ability to leverage large quantities of training data. We then
provide ideas for improving the algorithm and for designing other training
algorithms for similar architectures. Finally, we discuss potential benefits
that could be gained if an effective training algorithm is found, and suggest
experiments for evaluating whether these benefits exist in practice.
Related papers
- Task Agnostic Architecture for Algorithm Induction via Implicit Composition [10.627575117586417]
This position paper aims to explore developing such a unified architecture and proposes a theoretical framework of how it could be constructed.
Recent Generative AI, especially Transformer-based models, demonstrate potential as an architecture capable of constructing algorithms for a wide range of domains.
Our exploration delves into current capabilities and limitations of Transformer-based and other methods in efficient and correct algorithm composition.
arXiv Detail & Related papers (2024-04-03T04:31:09Z) - Towards a population-informed approach to the definition of data-driven
models for structural dynamics [0.0]
A population-based scheme is followed here and two different machine-learning algorithms from the meta-learning domain are used.
The algorithms seem to perform as intended and outperform a traditional machine-learning algorithm at approximating the quantities of interest.
arXiv Detail & Related papers (2023-07-19T09:45:41Z) - A Generalist Neural Algorithmic Learner [18.425083543441776]
We build a single graph neural network processor capable of learning to execute a wide range of algorithms.
We show that it is possible to effectively learn algorithms in a multi-task manner, so long as we can learn to execute them well in a single-task regime.
arXiv Detail & Related papers (2022-09-22T16:41:33Z) - Stabilizing Q-learning with Linear Architectures for Provably Efficient
Learning [53.17258888552998]
This work proposes an exploration variant of the basic $Q$-learning protocol with linear function approximation.
We show that the performance of the algorithm degrades very gracefully under a novel and more permissive notion of approximation error.
arXiv Detail & Related papers (2022-06-01T23:26:51Z) - Gone Fishing: Neural Active Learning with Fisher Embeddings [55.08537975896764]
There is an increasing need for active learning algorithms that are compatible with deep neural networks.
This article introduces BAIT, a practical representation of tractable, and high-performing active learning algorithm for neural networks.
arXiv Detail & Related papers (2021-06-17T17:26:31Z) - Joint Learning of Neural Transfer and Architecture Adaptation for Image
Recognition [77.95361323613147]
Current state-of-the-art visual recognition systems rely on pretraining a neural network on a large-scale dataset and finetuning the network weights on a smaller dataset.
In this work, we prove that dynamically adapting network architectures tailored for each domain task along with weight finetuning benefits in both efficiency and effectiveness.
Our method can be easily generalized to an unsupervised paradigm by replacing supernet training with self-supervised learning in the source domain tasks and performing linear evaluation in the downstream tasks.
arXiv Detail & Related papers (2021-03-31T08:15:17Z) - Fast Object Segmentation Learning with Kernel-based Methods for Robotics [21.48920421574167]
Object segmentation is a key component in the visual system of a robot that performs tasks like grasping and object manipulation.
We propose a novel architecture for object segmentation, that overcomes this problem and provides comparable performance in a fraction of the time required by the state-of-the-art methods.
Our approach is validated on the YCB-Video dataset which is widely adopted in the computer vision and robotics community.
arXiv Detail & Related papers (2020-11-25T15:07:39Z) - Learned Greedy Method (LGM): A Novel Neural Architecture for Sparse
Coding and Beyond [24.160276545294288]
We propose an unfolded version of a greedy pursuit algorithm for the same goal.
Key features of our Learned Greedy Method (LGM) are the ability to accommodate a dynamic number of unfolded layers.
arXiv Detail & Related papers (2020-10-14T13:17:02Z) - Learning to Stop While Learning to Predict [85.7136203122784]
Many algorithm-inspired deep models are restricted to a fixed-depth'' for all inputs.
Similar to algorithms, the optimal depth of a deep architecture may be different for different input instances.
In this paper, we tackle this varying depth problem using a steerable architecture.
We show that the learned deep model along with the stopping policy improves the performances on a diverse set of tasks.
arXiv Detail & Related papers (2020-06-09T07:22:01Z) - AutoML-Zero: Evolving Machine Learning Algorithms From Scratch [76.83052807776276]
We show that it is possible to automatically discover complete machine learning algorithms just using basic mathematical operations as building blocks.
We demonstrate this by introducing a novel framework that significantly reduces human bias through a generic search space.
We believe these preliminary successes in discovering machine learning algorithms from scratch indicate a promising new direction in the field.
arXiv Detail & Related papers (2020-03-06T19:00:04Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.