Dynamic Inference with Neural Interpreters
- URL: http://arxiv.org/abs/2110.06399v1
- Date: Tue, 12 Oct 2021 23:22:45 GMT
- Title: Dynamic Inference with Neural Interpreters
- Authors: Nasim Rahaman, Muhammad Waleed Gondal, Shruti Joshi, Peter Gehler,
Yoshua Bengio, Francesco Locatello, Bernhard Sch\"olkopf
- Abstract summary: We present Neural Interpreters, an architecture that factorizes inference in a self-attention network as a system of modules.
inputs to the model are routed through a sequence of functions in a way that is end-to-end learned.
We show that Neural Interpreters perform on par with the vision transformer using fewer parameters, while being transferrable to a new task in a sample efficient manner.
- Score: 72.90231306252007
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Modern neural network architectures can leverage large amounts of data to
generalize well within the training distribution. However, they are less
capable of systematic generalization to data drawn from unseen but related
distributions, a feat that is hypothesized to require compositional reasoning
and reuse of knowledge. In this work, we present Neural Interpreters, an
architecture that factorizes inference in a self-attention network as a system
of modules, which we call \emph{functions}. Inputs to the model are routed
through a sequence of functions in a way that is end-to-end learned. The
proposed architecture can flexibly compose computation along width and depth,
and lends itself well to capacity extension after training. To demonstrate the
versatility of Neural Interpreters, we evaluate it in two distinct settings:
image classification and visual abstract reasoning on Raven Progressive
Matrices. In the former, we show that Neural Interpreters perform on par with
the vision transformer using fewer parameters, while being transferrable to a
new task in a sample efficient manner. In the latter, we find that Neural
Interpreters are competitive with respect to the state-of-the-art in terms of
systematic generalization
Related papers
- Coding schemes in neural networks learning classification tasks [52.22978725954347]
We investigate fully-connected, wide neural networks learning classification tasks.
We show that the networks acquire strong, data-dependent features.
Surprisingly, the nature of the internal representations depends crucially on the neuronal nonlinearity.
arXiv Detail & Related papers (2024-06-24T14:50:05Z) - Equivariance with Learned Canonicalization Functions [77.32483958400282]
We show that learning a small neural network to perform canonicalization is better than using predefineds.
Our experiments show that learning the canonicalization function is competitive with existing techniques for learning equivariant functions across many tasks.
arXiv Detail & Related papers (2022-11-11T21:58:15Z) - Interpretable part-whole hierarchies and conceptual-semantic
relationships in neural networks [4.153804257347222]
We present Agglomerator, a framework capable of providing a representation of part-whole hierarchies from visual cues.
We evaluate our method on common datasets, such as SmallNORB, MNIST, FashionMNIST, CIFAR-10, and CIFAR-100.
arXiv Detail & Related papers (2022-03-07T10:56:13Z) - Modeling Structure with Undirected Neural Networks [20.506232306308977]
We propose undirected neural networks, a flexible framework for specifying computations that can be performed in any order.
We demonstrate the effectiveness of undirected neural architectures, both unstructured and structured, on a range of tasks.
arXiv Detail & Related papers (2022-02-08T10:06:51Z) - Data-driven emergence of convolutional structure in neural networks [83.4920717252233]
We show how fully-connected neural networks solving a discrimination task can learn a convolutional structure directly from their inputs.
By carefully designing data models, we show that the emergence of this pattern is triggered by the non-Gaussian, higher-order local structure of the inputs.
arXiv Detail & Related papers (2022-02-01T17:11:13Z) - Graph Kernel Neural Networks [53.91024360329517]
We propose to use graph kernels, i.e. kernel functions that compute an inner product on graphs, to extend the standard convolution operator to the graph domain.
This allows us to define an entirely structural model that does not require computing the embedding of the input graph.
Our architecture allows to plug-in any type of graph kernels and has the added benefit of providing some interpretability.
arXiv Detail & Related papers (2021-12-14T14:48:08Z) - Discrete-Valued Neural Communication [85.3675647398994]
We show that restricting the transmitted information among components to discrete representations is a beneficial bottleneck.
Even though individuals have different understandings of what a "cat" is based on their specific experiences, the shared discrete token makes it possible for communication among individuals to be unimpeded by individual differences in internal representation.
We extend the quantization mechanism from the Vector-Quantized Variational Autoencoder to multi-headed discretization with shared codebooks and use it for discrete-valued neural communication.
arXiv Detail & Related papers (2021-07-06T03:09:25Z) - It's FLAN time! Summing feature-wise latent representations for
interpretability [0.0]
We propose a novel class of structurally-constrained neural networks, which we call FLANs (Feature-wise Latent Additive Networks)
FLANs process each input feature separately, computing for each of them a representation in a common latent space.
These feature-wise latent representations are then simply summed, and the aggregated representation is used for prediction.
arXiv Detail & Related papers (2021-06-18T12:19:33Z) - Adaptive Explainable Neural Networks (AxNNs) [8.949704905866888]
We develop a new framework called Adaptive Explainable Neural Networks (AxNN) for achieving the dual goals of good predictive performance and model interpretability.
For predictive performance, we build a structured neural network made up of ensembles of generalized additive model networks and additive index models.
For interpretability, we show how to decompose the results of AxNN into main effects and higher-order interaction effects.
arXiv Detail & Related papers (2020-04-05T23:40:57Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.