Related papers: Meta-Learning Neural Mechanisms rather than Bayesian Priors

Meta-Learning Neural Mechanisms rather than Bayesian Priors

URL: http://arxiv.org/abs/2503.16048v1
Date: Thu, 20 Mar 2025 11:33:59 GMT
Title: Meta-Learning Neural Mechanisms rather than Bayesian Priors
Authors: Michael Goodale, Salvador Mascarenhas, Yair Lakretz,
Abstract summary: We investigate the meta-learning of formal languages and find that, contrary to previous claims, meta-trained models are not learning simplicity-based priors.<n>We find evidence that meta-training imprints neural mechanisms into the model, which function like cognitive primitives for the network on downstream tasks.
Score: 4.451173777061901
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Children acquire language despite being exposed to several orders of magnitude less data than large language models require. Meta-learning has been proposed as a way to integrate human-like learning biases into neural-network architectures, combining both the structured generalizations of symbolic models with the scalability of neural-network models. But what does meta-learning exactly imbue the model with? We investigate the meta-learning of formal languages and find that, contrary to previous claims, meta-trained models are not learning simplicity-based priors when meta-trained on datasets organised around simplicity. Rather, we find evidence that meta-training imprints neural mechanisms (such as counters) into the model, which function like cognitive primitives for the network on downstream tasks. Most surprisingly, we find that meta-training on a single formal language can provide as much improvement to a model as meta-training on 5000 different formal languages, provided that the formal language incentivizes the learning of useful neural mechanisms. Taken together, our findings provide practical implications for efficient meta-learning paradigms and new theoretical insights into linking symbolic theories and neural mechanisms.

Related papers

On the Geometry of Semantics in Next-token Prediction [27.33243506775655]
Modern language models capture linguistic meaning despite being trained solely through next-token prediction.<n>We investigate how this conceptually simple training objective leads models to extract and encode latent semantic and grammatical concepts.<n>Our work bridges distributional semantics, neural collapse geometry, and neural network training dynamics, providing insights into how NTP's implicit biases shape the emergence of meaning representations in language models.
arXiv Detail & Related papers (2025-05-13T08:46:04Z)
Scaling Laws and Representation Learning in Simple Hierarchical Languages: Transformers vs. Convolutional Architectures [49.19753720526998]
We derive theoretical scaling laws for neural network performance on synthetic datasets.<n>We validate that convolutional networks, whose structure aligns with that of the generative process through locality and weight sharing, enjoy a faster scaling of performance.<n>This finding clarifies the architectural biases underlying neural scaling laws and highlights how representation learning is shaped by the interaction between model architecture and the statistical properties of data.
arXiv Detail & Related papers (2025-05-11T17:44:14Z)
Developmental Predictive Coding Model for Early Infancy Mono and Bilingual Vocal Continual Learning [69.8008228833895]
We propose a small-sized generative neural network equipped with a continual learning mechanism.<n>Our model prioritizes interpretability and demonstrates the advantages of online learning.
arXiv Detail & Related papers (2024-12-23T10:23:47Z)
Training Neural Networks as Recognizers of Formal Languages [87.06906286950438]
We train and evaluate neural networks directly as binary classifiers of strings.<n>We provide results on a variety of languages across the Chomsky hierarchy for three neural architectures.<n>Our contributions will facilitate theoretically sound empirical testing of language recognition claims in future work.
arXiv Detail & Related papers (2024-11-11T16:33:25Z)
Brain-Like Language Processing via a Shallow Untrained Multihead Attention Network [16.317199232071232]
Large Language Models (LLMs) have been shown to be effective models of the human language system. In this work, we investigate the key architectural components driving the surprising alignment of untrained models.
arXiv Detail & Related papers (2024-06-21T12:54:03Z)
Language Evolution with Deep Learning [49.879239655532324]
Computational modeling plays an essential role in the study of language emergence. It aims to simulate the conditions and learning processes that could trigger the emergence of a structured language. This chapter explores another class of computational models that have recently revolutionized the field of machine learning: deep learning models.
arXiv Detail & Related papers (2024-03-18T16:52:54Z)
Context-Aware Meta-Learning [52.09326317432577]
We propose a meta-learning algorithm that emulates Large Language Models by learning new visual concepts during inference without fine-tuning. Our approach exceeds or matches the state-of-the-art algorithm, P>M>F, on 8 out of 11 meta-learning benchmarks.
arXiv Detail & Related papers (2023-10-17T03:35:27Z)
Meta predictive learning model of languages in neural circuits [2.5690340428649328]
We propose a mean-field learning model within the predictive coding framework. Our model reveals that most of the connections become deterministic after learning. Our model provides a starting point to investigate the connection among brain computation, next-token prediction and general intelligence.
arXiv Detail & Related papers (2023-09-08T03:58:05Z)
Meta-Learning in Spiking Neural Networks with Reward-Modulated STDP [2.179313476241343]
We propose a bio-plausible meta-learning model inspired by the hippocampus and the prefrontal cortex. Our new model can easily be applied to spike-based neuromorphic devices and enables fast learning in neuromorphic hardware.
arXiv Detail & Related papers (2023-06-07T13:08:46Z)
Meta-Learning via Classifier(-free) Guidance [5.812784742024491]
State-of-the-art meta-learning techniques do not optimize for zero-shot adaptation to unseen tasks. We propose meta-learning techniques that use natural language guidance to achieve higher zero-shot performance.
arXiv Detail & Related papers (2022-10-17T11:09:35Z)
Understanding Benign Overfitting in Nested Meta Learning [44.08080315581763]
We focus on the meta learning settings with a challenging nested structure that we term the nested meta learning. Our theory contributes to understanding the delicate interplay among data heterogeneity, model adaptation and benign overfitting in nested meta learning tasks.
arXiv Detail & Related papers (2022-06-27T17:46:57Z)
Dependency-based Mixture Language Models [53.152011258252315]
We introduce the Dependency-based Mixture Language Models. In detail, we first train neural language models with a novel dependency modeling objective. We then formulate the next-token probability by mixing the previous dependency modeling probability distributions with self-attention.
arXiv Detail & Related papers (2022-03-19T06:28:30Z)
Leap-Of-Thought: Teaching Pre-Trained Models to Systematically Reason Over Implicit Knowledge [96.92252296244233]
Large pre-trained language models (LMs) acquire some reasoning capacity, but this ability is difficult to control. We show that LMs can be trained to reliably perform systematic reasoning combining both implicit, pre-trained knowledge and explicit natural language statements. Our work paves a path towards open-domain systems that constantly improve by interacting with users who can instantly correct a model by adding simple natural language statements.
arXiv Detail & Related papers (2020-06-11T17:02:20Z)
Unraveling Meta-Learning: Understanding Feature Representations for Few-Shot Tasks [55.66438591090072]
We develop a better understanding of the underlying mechanics of meta-learning and the difference between models trained using meta-learning and models trained classically. We develop a regularizer which boosts the performance of standard training routines for few-shot classification.
arXiv Detail & Related papers (2020-02-17T03:18:45Z)

This list is automatically generated from the titles and abstracts of the papers in this site.