Pointer Value Retrieval: A new benchmark for understanding the limits of
neural network generalization
- URL: http://arxiv.org/abs/2107.12580v1
- Date: Tue, 27 Jul 2021 03:50:31 GMT
- Title: Pointer Value Retrieval: A new benchmark for understanding the limits of
neural network generalization
- Authors: Chiyuan Zhang, Maithra Raghu, Jon Kleinberg, Samy Bengio
- Abstract summary: We introduce a novel benchmark, Pointer Value Retrieval (PVR) tasks, that explore the limits of neural network generalization.
PVR tasks can consist of visual as well as symbolic inputs, each with varying levels of difficulty.
We demonstrate that this task structure provides a rich testbed for understanding generalization.
- Score: 40.21297628440919
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The successes of deep learning critically rely on the ability of neural
networks to output meaningful predictions on unseen data -- generalization. Yet
despite its criticality, there remain fundamental open questions on how neural
networks generalize. How much do neural networks rely on memorization -- seeing
highly similar training examples -- and how much are they capable of
human-intelligence styled reasoning -- identifying abstract rules underlying
the data? In this paper we introduce a novel benchmark, Pointer Value Retrieval
(PVR) tasks, that explore the limits of neural network generalization. While
PVR tasks can consist of visual as well as symbolic inputs, each with varying
levels of difficulty, they all have a simple underlying rule. One part of the
PVR task input acts as a pointer, giving the location of a different part of
the input, which forms the value (and output). We demonstrate that this task
structure provides a rich testbed for understanding generalization, with our
empirical study showing large variations in neural network performance based on
dataset size, task complexity and model architecture. The interaction of
position, values and the pointer rule also allow the development of nuanced
tests of generalization, by introducing distribution shift and increasing
functional complexity. These reveal both subtle failures and surprising
successes, suggesting many promising directions of exploration on this
benchmark.
Related papers
- Simple and Effective Transfer Learning for Neuro-Symbolic Integration [50.592338727912946]
A potential solution to this issue is Neuro-Symbolic Integration (NeSy), where neural approaches are combined with symbolic reasoning.
Most of these methods exploit a neural network to map perceptions to symbols and a logical reasoner to predict the output of the downstream task.
They suffer from several issues, including slow convergence, learning difficulties with complex perception tasks, and convergence to local minima.
This paper proposes a simple yet effective method to ameliorate these problems.
arXiv Detail & Related papers (2024-02-21T15:51:01Z) - DISCOVER: Making Vision Networks Interpretable via Competition and
Dissection [11.028520416752325]
This work contributes to post-hoc interpretability, and specifically Network Dissection.
Our goal is to present a framework that makes it easier to discover the individual functionality of each neuron in a network trained on a vision task.
arXiv Detail & Related papers (2023-10-07T21:57:23Z) - Neural networks trained with SGD learn distributions of increasing
complexity [78.30235086565388]
We show that neural networks trained using gradient descent initially classify their inputs using lower-order input statistics.
We then exploit higher-order statistics only later during training.
We discuss the relation of DSB to other simplicity biases and consider its implications for the principle of universality in learning.
arXiv Detail & Related papers (2022-11-21T15:27:22Z) - Inducing Gaussian Process Networks [80.40892394020797]
We propose inducing Gaussian process networks (IGN), a simple framework for simultaneously learning the feature space as well as the inducing points.
The inducing points, in particular, are learned directly in the feature space, enabling a seamless representation of complex structured domains.
We report on experimental results for real-world data sets showing that IGNs provide significant advances over state-of-the-art methods.
arXiv Detail & Related papers (2022-04-21T05:27:09Z) - Interpretable part-whole hierarchies and conceptual-semantic
relationships in neural networks [4.153804257347222]
We present Agglomerator, a framework capable of providing a representation of part-whole hierarchies from visual cues.
We evaluate our method on common datasets, such as SmallNORB, MNIST, FashionMNIST, CIFAR-10, and CIFAR-100.
arXiv Detail & Related papers (2022-03-07T10:56:13Z) - Data-driven emergence of convolutional structure in neural networks [83.4920717252233]
We show how fully-connected neural networks solving a discrimination task can learn a convolutional structure directly from their inputs.
By carefully designing data models, we show that the emergence of this pattern is triggered by the non-Gaussian, higher-order local structure of the inputs.
arXiv Detail & Related papers (2022-02-01T17:11:13Z) - Robust Generalization of Quadratic Neural Networks via Function
Identification [19.87036824512198]
Generalization bounds from learning theory often assume that the test distribution is close to the training distribution.
We show that for quadratic neural networks, we can identify the function represented by the model even though we cannot identify its parameters.
arXiv Detail & Related papers (2021-09-22T18:02:00Z) - A neural anisotropic view of underspecification in deep learning [60.119023683371736]
We show that the way neural networks handle the underspecification of problems is highly dependent on the data representation.
Our results highlight that understanding the architectural inductive bias in deep learning is fundamental to address the fairness, robustness, and generalization of these systems.
arXiv Detail & Related papers (2021-04-29T14:31:09Z) - Analyzing Representations inside Convolutional Neural Networks [8.803054559188048]
We propose a framework to categorize the concepts a network learns based on the way it clusters a set of input examples.
This framework is unsupervised and can work without any labels for input features.
We extensively evaluate the proposed method and demonstrate that it produces human-understandable and coherent concepts.
arXiv Detail & Related papers (2020-12-23T07:10:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.