Determinantal Point Process Attention Over Grid Cell Code Supports Out
of Distribution Generalization
- URL: http://arxiv.org/abs/2305.18417v3
- Date: Tue, 23 Jan 2024 10:50:06 GMT
- Title: Determinantal Point Process Attention Over Grid Cell Code Supports Out
of Distribution Generalization
- Authors: Shanka Subhra Mondal, Steven Frankland, Taylor Webb, and Jonathan D.
Cohen
- Abstract summary: We identify properties of processing in the brain that may contribute to strong generalization performance.
We show that a loss function that combines standard task-optimized error with DPP-A can exploit the recurring motifs in the grid cell code.
This provides both an interpretation of how the grid cell code in the mammalian brain may contribute to generalization performance.
- Score: 5.422292504420425
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Deep neural networks have made tremendous gains in emulating human-like
intelligence, and have been used increasingly as ways of understanding how the
brain may solve the complex computational problems on which this relies.
However, these still fall short of, and therefore fail to provide insight into
how the brain supports strong forms of generalization of which humans are
capable. One such case is out-of-distribution (OOD) generalization-successful
performance on test examples that lie outside the distribution of the training
set. Here, we identify properties of processing in the brain that may
contribute to this ability. We describe a two-part algorithm that draws on
specific features of neural computation to achieve OOD generalization, and
provide a proof of concept by evaluating performance on two challenging
cognitive tasks. First we draw on the fact that the mammalian brain represents
metric spaces using grid cell code (e.g., in the entorhinal cortex): abstract
representations of relational structure, organized in recurring motifs that
cover the representational space. Second, we propose an attentional mechanism
that operates over the grid cell code using Determinantal Point Process (DPP),
that we call DPP attention (DPP-A) -- a transformation that ensures maximum
sparseness in the coverage of that space. We show that a loss function that
combines standard task-optimized error with DPP-A can exploit the recurring
motifs in the grid cell code, and can be integrated with common architectures
to achieve strong OOD generalization performance on analogy and arithmetic
tasks. This provides both an interpretation of how the grid cell code in the
mammalian brain may contribute to generalization performance, and at the same
time a potential means for improving such capabilities in artificial neural
networks.
Related papers
- Gradient-based inference of abstract task representations for generalization in neural networks [5.794537047184604]
We show that gradients backpropagated through a neural network to a task representation layer are an efficient way to infer current task demands.
We demonstrate that gradient-based inference provides higher learning efficiency and generalization to novel tasks and limits.
arXiv Detail & Related papers (2024-07-24T15:28:08Z) - Heterogenous Memory Augmented Neural Networks [84.29338268789684]
We introduce a novel heterogeneous memory augmentation approach for neural networks.
By introducing learnable memory tokens with attention mechanism, we can effectively boost performance without huge computational overhead.
We show our approach on various image and graph-based tasks under both in-distribution (ID) and out-of-distribution (OOD) conditions.
arXiv Detail & Related papers (2023-10-17T01:05:28Z) - DISCOVER: Making Vision Networks Interpretable via Competition and
Dissection [11.028520416752325]
This work contributes to post-hoc interpretability, and specifically Network Dissection.
Our goal is to present a framework that makes it easier to discover the individual functionality of each neuron in a network trained on a vision task.
arXiv Detail & Related papers (2023-10-07T21:57:23Z) - Redundancy and Concept Analysis for Code-trained Language Models [5.726842555987591]
Code-trained language models have proven to be highly effective for various code intelligence tasks.
They can be challenging to train and deploy for many software engineering applications due to computational bottlenecks and memory constraints.
We perform the first neuron-level analysis for source code models to identify textitimportant neurons within latent representations.
arXiv Detail & Related papers (2023-05-01T15:22:41Z) - Improved generalization with deep neural operators for engineering systems: Path towards digital twin [0.4551615447454769]
We evaluate the capabilities of Deep Operator Networks (DeepONets), an ONets implementation using a branch/trunk architecture.
DeepONets can accurately learn the solution operators, achieving prediction accuracy scores above 0.96 for the ODE and diffusion problems.
More importantly, when evaluated on unseen scenarios (zero shot feature), the trained models exhibit excellent generalization ability.
arXiv Detail & Related papers (2023-01-17T04:57:31Z) - Seeking Interpretability and Explainability in Binary Activated Neural Networks [2.828173677501078]
We study the use of binary activated neural networks as interpretable and explainable predictors in the context of regression tasks.
We present an approach based on the efficient computation of SHAP values for quantifying the relative importance of the features, hidden neurons and even weights.
arXiv Detail & Related papers (2022-09-07T20:11:17Z) - The Causal Neural Connection: Expressiveness, Learnability, and
Inference [125.57815987218756]
An object called structural causal model (SCM) represents a collection of mechanisms and sources of random variation of the system under investigation.
In this paper, we show that the causal hierarchy theorem (Thm. 1, Bareinboim et al., 2020) still holds for neural models.
We introduce a special type of SCM called a neural causal model (NCM), and formalize a new type of inductive bias to encode structural constraints necessary for performing causal inferences.
arXiv Detail & Related papers (2021-07-02T01:55:18Z) - A neural anisotropic view of underspecification in deep learning [60.119023683371736]
We show that the way neural networks handle the underspecification of problems is highly dependent on the data representation.
Our results highlight that understanding the architectural inductive bias in deep learning is fundamental to address the fairness, robustness, and generalization of these systems.
arXiv Detail & Related papers (2021-04-29T14:31:09Z) - Recognizing and Verifying Mathematical Equations using Multiplicative
Differential Neural Units [86.9207811656179]
We show that memory-augmented neural networks (NNs) can achieve higher-order, memory-augmented extrapolation, stable performance, and faster convergence.
Our models achieve a 1.53% average improvement over current state-of-the-art methods in equation verification and achieve a 2.22% Top-1 average accuracy and 2.96% Top-5 average accuracy for equation completion.
arXiv Detail & Related papers (2021-04-07T03:50:11Z) - And/or trade-off in artificial neurons: impact on adversarial robustness [91.3755431537592]
Presence of sufficient number of OR-like neurons in a network can lead to classification brittleness and increased vulnerability to adversarial attacks.
We define AND-like neurons and propose measures to increase their proportion in the network.
Experimental results on the MNIST dataset suggest that our approach holds promise as a direction for further exploration.
arXiv Detail & Related papers (2021-02-15T08:19:05Z) - Compositional Generalization by Learning Analytical Expressions [87.15737632096378]
A memory-augmented neural model is connected with analytical expressions to achieve compositional generalization.
Experiments on the well-known benchmark SCAN demonstrate that our model seizes a great ability of compositional generalization.
arXiv Detail & Related papers (2020-06-18T15:50:57Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.