Related papers: Distilling Symbolic Priors for Concept Learning into Neural Networks

Distilling Symbolic Priors for Concept Learning into Neural Networks

URL: http://arxiv.org/abs/2402.07035v1
Date: Sat, 10 Feb 2024 20:06:26 GMT
Title: Distilling Symbolic Priors for Concept Learning into Neural Networks
Authors: Ioana Marinescu, R. Thomas McCoy, Thomas L. Griffiths
Abstract summary: We show that inductive biases can be instantiated in artificial neural networks by distilling a prior distribution from a symbolic Bayesian model via meta-learning. We use this approach to create a neural network with an inductive bias towards concepts expressed as short logical formulas.
Score: 9.915299875869046
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Humans can learn new concepts from a small number of examples by drawing on their inductive biases. These inductive biases have previously been captured by using Bayesian models defined over symbolic hypothesis spaces. Is it possible to create a neural network that displays the same inductive biases? We show that inductive biases that enable rapid concept learning can be instantiated in artificial neural networks by distilling a prior distribution from a symbolic Bayesian model via meta-learning, an approach for extracting the common structure from a set of tasks. By generating the set of tasks used in meta-learning from the prior distribution of a Bayesian model, we are able to transfer that prior into a neural network. We use this approach to create a neural network with an inductive bias towards concepts expressed as short logical formulas. Analyzing results from previous behavioral experiments in which people learned logical concepts from a few examples, we find that our meta-trained models are highly aligned with human performance.

Related papers

From superposition to sparse codes: interpretable representations in neural networks [3.6738925004882685]
Recent evidence suggests that neural networks encode features in superposition, meaning that input concepts are linearly overlaid within the network's representations. We present a perspective that explains this phenomenon and provides a foundation for extracting interpretable representations from neural activations. Our arguments have implications for neural coding theories, AI transparency, and the broader goal of making deep learning models more interpretable.
arXiv Detail & Related papers (2025-03-03T18:49:59Z)
Deep Learning Through A Telescoping Lens: A Simple Model Provides Empirical Insights On Grokking, Gradient Boosting & Beyond [61.18736646013446]
In pursuit of a deeper understanding of its surprising behaviors, we investigate the utility of a simple yet accurate model of a trained neural network. Across three case studies, we illustrate how it can be applied to derive new empirical insights on a diverse range of prominent phenomena.
arXiv Detail & Related papers (2024-10-31T22:54:34Z)
This Probably Looks Exactly Like That: An Invertible Prototypical Network [8.957872207471311]
Prototypical neural networks represent an exciting way forward in realizing human-comprehensible machine learning without concept annotations. We find that reliance on indirect interpretation functions for prototypical explanations imposes a severe limit on prototypes' informative power. We propose one such model, called ProtoFlow, by composing a normalizing flow with Gaussian mixture models.
arXiv Detail & Related papers (2024-07-16T21:51:02Z)
Understanding Activation Patterns in Artificial Neural Networks by Exploring Stochastic Processes [0.0]
We propose utilizing the framework of processes, which has been underutilized thus far. We focus solely on activation frequency, leveraging neuroscience techniques used for real neuron spike trains. We derive parameters describing activation patterns in each network, revealing consistent differences across architectures and training sets.
arXiv Detail & Related papers (2023-08-01T22:12:30Z)
A Recursive Bateson-Inspired Model for the Generation of Semantic Formal Concepts from Spatial Sensory Data [77.34726150561087]
This paper presents a new symbolic-only method for the generation of hierarchical concept structures from complex sensory data. The approach is based on Bateson's notion of difference as the key to the genesis of an idea or a concept. The model is able to produce fairly rich yet human-readable conceptual representations without training.
arXiv Detail & Related papers (2023-07-16T15:59:13Z)
Utility-Probability Duality of Neural Networks [4.871730595406078]
We propose an alternative utility-based explanation to the standard supervised learning procedure in deep learning. The basic idea is to interpret the learned neural network not as a probability model but as an ordinal utility function. We show that for all neural networks with softmax outputs, the SGD learning dynamic of maximum likelihood estimation can be seen as an iteration process.
arXiv Detail & Related papers (2023-05-24T08:09:07Z)
Modeling rapid language learning by distilling Bayesian priors into artificial neural networks [18.752638142258668]
We show that learning from limited naturalistic data is possible with an approach that combines the strong inductive biases of a Bayesian model with the flexible representations of a neural network. The resulting system can learn formal linguistic patterns from a small number of examples. It can also learn aspects of English syntax from a corpus of natural language.
arXiv Detail & Related papers (2023-05-24T04:11:59Z)
Neural networks trained with SGD learn distributions of increasing complexity [78.30235086565388]
We show that neural networks trained using gradient descent initially classify their inputs using lower-order input statistics. We then exploit higher-order statistics only later during training. We discuss the relation of DSB to other simplicity biases and consider its implications for the principle of universality in learning.
arXiv Detail & Related papers (2022-11-21T15:27:22Z)
Data-driven emergence of convolutional structure in neural networks [83.4920717252233]
We show how fully-connected neural networks solving a discrimination task can learn a convolutional structure directly from their inputs. By carefully designing data models, we show that the emergence of this pattern is triggered by the non-Gaussian, higher-order local structure of the inputs.
arXiv Detail & Related papers (2022-02-01T17:11:13Z)
What can linearized neural networks actually say about generalization? [67.83999394554621]
In certain infinitely-wide neural networks, the neural tangent kernel (NTK) theory fully characterizes generalization. We show that the linear approximations can indeed rank the learning complexity of certain tasks for neural networks. Our work provides concrete examples of novel deep learning phenomena which can inspire future theoretical research.
arXiv Detail & Related papers (2021-06-12T13:05:11Z)
Learning Evolved Combinatorial Symbols with a Neuro-symbolic Generative Model [35.341634678764066]
Humans have the ability to rapidly understand rich concepts from limited data. We propose a neuro-symbolic generative model which combines the strengths of previous approaches to concept learning.
arXiv Detail & Related papers (2021-04-16T17:57:51Z)
A Bayesian Perspective on Training Speed and Model Selection [51.15664724311443]
We show that a measure of a model's training speed can be used to estimate its marginal likelihood. We verify our results in model selection tasks for linear models and for the infinite-width limit of deep neural networks. Our results suggest a promising new direction towards explaining why neural networks trained with gradient descent are biased towards functions that generalize well.
arXiv Detail & Related papers (2020-10-27T17:56:14Z)

This list is automatically generated from the titles and abstracts of the papers in this site.