Related papers: It's Hard for Neural Networks To Learn the Game of Life

It's Hard for Neural Networks To Learn the Game of Life

URL: http://arxiv.org/abs/2009.01398v1
Date: Thu, 3 Sep 2020 00:47:08 GMT
Title: It's Hard for Neural Networks To Learn the Game of Life
Authors: Jacob M. Springer, Garrett T. Kenyon
Abstract summary: Recent findings suggest that neural networks rely on lucky random initial weights of "lottery tickets" that converge quickly to a solution. We examine small convolutional networks that are trained to predict n steps of the two-dimensional cellular automaton Conway's Game of Life. We find that networks of this architecture trained on this task rarely converge.
Score: 4.061135251278187
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Efforts to improve the learning abilities of neural networks have focused mostly on the role of optimization methods rather than on weight initializations. Recent findings, however, suggest that neural networks rely on lucky random initial weights of subnetworks called "lottery tickets" that converge quickly to a solution. To investigate how weight initializations affect performance, we examine small convolutional networks that are trained to predict n steps of the two-dimensional cellular automaton Conway's Game of Life, the update rules of which can be implemented efficiently in a 2n+1 layer convolutional network. We find that networks of this architecture trained on this task rarely converge. Rather, networks require substantially more parameters to consistently converge. In addition, near-minimal architectures are sensitive to tiny changes in parameters: changing the sign of a single weight can cause the network to fail to learn. Finally, we observe a critical value d_0 such that training minimal networks with examples in which cells are alive with probability d_0 dramatically increases the chance of convergence to a solution. We conclude that training convolutional neural networks to learn the input/output function represented by n steps of Game of Life exhibits many characteristics predicted by the lottery ticket hypothesis, namely, that the size of the networks required to learn this function are often significantly larger than the minimal network required to implement the function.

Related papers

When less is more: evolving large neural networks from small ones [0.0]
We study feed-forward neural networks that are small and dynamic, whose nodes can be added (or subtracted) during training. A single neuronal weight in the network controls the network's size, while the weight itself is optimized by the same gradient-descent algorithm. We train and evaluate such Nimble Neural Networks on nonlinear regression and classification tasks where they outperform the corresponding static networks.
arXiv Detail & Related papers (2025-01-29T21:56:38Z)
Automatic Construction of Pattern Classifiers Capable of Continuous Incremental Learning and Unlearning Tasks Based on Compact-Sized Probabilistic Neural Network [0.0]
This paper proposes a novel approach to pattern classification using a probabilistic neural network model. The strategy is based on a compact-sized probabilistic neural network capable of continuous incremental learning and unlearning tasks.
arXiv Detail & Related papers (2025-01-01T05:02:53Z)
ReLU Neural Networks with Linear Layers are Biased Towards Single- and Multi-Index Models [9.96121040675476]
This manuscript explores how properties of functions learned by neural networks of depth greater than two layers affect predictions. Our framework considers a family of networks of varying depths that all have the same capacity but different representation costs.
arXiv Detail & Related papers (2023-05-24T22:10:12Z)
Adaptive Neural Networks Using Residual Fitting [2.546014024559691]
We present a network-growth method that searches for explainable error in the network's residuals and grows the network if sufficient error is detected. Within these tasks, the growing network can often achieve better performance than small networks that do not grow.
arXiv Detail & Related papers (2023-01-13T19:52:30Z)
Neural networks with linear threshold activations: structure and algorithms [1.795561427808824]
We show that 2 hidden layers are necessary and sufficient to represent any function representable in the class. We also give precise bounds on the sizes of the neural networks required to represent any function in the class. We propose a new class of neural networks that we call shortcut linear threshold networks.
arXiv Detail & Related papers (2021-11-15T22:33:52Z)
Mitigating Performance Saturation in Neural Marked Point Processes: Architectures and Loss Functions [50.674773358075015]
We propose a simple graph-based network structure called GCHP, which utilizes only graph convolutional layers. We show that GCHP can significantly reduce training time and the likelihood ratio loss with interarrival time probability assumptions can greatly improve the model performance.
arXiv Detail & Related papers (2021-07-07T16:59:14Z)
Fast Adaptation with Linearized Neural Networks [35.43406281230279]
We study the inductive biases of linearizations of neural networks, which we show to be surprisingly good summaries of the full network functions. Inspired by this finding, we propose a technique for embedding these inductive biases into Gaussian processes through a kernel designed from the Jacobian of the network. In this setting, domain adaptation takes the form of interpretable posterior inference, with accompanying uncertainty estimation.
arXiv Detail & Related papers (2021-03-02T03:23:03Z)
Firefly Neural Architecture Descent: a General Approach for Growing Neural Networks [50.684661759340145]
Firefly neural architecture descent is a general framework for progressively and dynamically growing neural networks. We show that firefly descent can flexibly grow networks both wider and deeper, and can be applied to learn accurate but resource-efficient neural architectures. In particular, it learns networks that are smaller in size but have higher average accuracy than those learned by the state-of-the-art methods.
arXiv Detail & Related papers (2021-02-17T04:47:18Z)
The Connection Between Approximation, Depth Separation and Learnability in Neural Networks [70.55686685872008]
We study the connection between learnability and approximation capacity. We show that learnability with deep networks of a target function depends on the ability of simpler classes to approximate the target.
arXiv Detail & Related papers (2021-01-31T11:32:30Z)
Learning Connectivity of Neural Networks from a Topological Perspective [80.35103711638548]
We propose a topological perspective to represent a network into a complete graph for analysis. By assigning learnable parameters to the edges which reflect the magnitude of connections, the learning process can be performed in a differentiable manner. This learning process is compatible with existing networks and owns adaptability to larger search spaces and different tasks.
arXiv Detail & Related papers (2020-08-19T04:53:31Z)
Prior knowledge distillation based on financial time series [0.8756822885568589]
We propose to use neural networks to represent indicators and train a large network constructed of smaller networks as feature layers. In numerical experiments, we find that our algorithm is faster and more accurate than traditional methods on real financial datasets.
arXiv Detail & Related papers (2020-06-16T15:26:06Z)
Network Adjustment: Channel Search Guided by FLOPs Utilization Ratio [101.84651388520584]
This paper presents a new framework named network adjustment, which considers network accuracy as a function of FLOPs. Experiments on standard image classification datasets and a wide range of base networks demonstrate the effectiveness of our approach.
arXiv Detail & Related papers (2020-04-06T15:51:00Z)
Large-Scale Gradient-Free Deep Learning with Recursive Local Representation Alignment [84.57874289554839]
Training deep neural networks on large-scale datasets requires significant hardware resources. Backpropagation, the workhorse for training these networks, is an inherently sequential process that is difficult to parallelize. We propose a neuro-biologically-plausible alternative to backprop that can be used to train deep networks.
arXiv Detail & Related papers (2020-02-10T16:20:02Z)

This list is automatically generated from the titles and abstracts of the papers in this site.