Developmentally-plausible Working Memory Shapes a Critical Period for Language Acquisition
- URL: http://arxiv.org/abs/2502.04795v2
- Date: Mon, 17 Feb 2025 01:55:26 GMT
- Title: Developmentally-plausible Working Memory Shapes a Critical Period for Language Acquisition
- Authors: Masato Mita, Ryo Yoshida, Yohei Oseki,
- Abstract summary: Large language models possess general linguistic abilities but acquire language less efficiently than humans.<n>This study proposes a method for integrating the developmental characteristics of working memory during the critical period.
- Score: 8.43537886261228
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Large language models possess general linguistic abilities but acquire language less efficiently than humans. This study proposes a method for integrating the developmental characteristics of working memory during the critical period, a stage when human language acquisition is particularly efficient, into the training process of language models. The proposed method introduces a mechanism that initially constrains working memory during the early stages of training and gradually relaxes this constraint in an exponential manner as learning progresses. Targeted syntactic evaluation shows that the proposed method outperforms conventional methods without memory constraints or with static memory constraints. These findings not only provide new directions for designing data-efficient language models but also offer indirect evidence supporting the role of the developmental characteristics of working memory as the underlying mechanism of the critical period in language acquisition.
Related papers
- Developmental Predictive Coding Model for Early Infancy Mono and Bilingual Vocal Continual Learning [69.8008228833895]
We propose a small-sized generative neural network equipped with a continual learning mechanism.<n>Our model prioritizes interpretability and demonstrates the advantages of online learning.
arXiv Detail & Related papers (2024-12-23T10:23:47Z) - Detecting Memorization in Large Language Models [0.0]
Large language models (LLMs) have achieved impressive results in natural language processing but are prone to memorizing portions of their training data.
Traditional methods for detecting memorization rely on output probabilities or loss functions.
We introduce an analytical method that precisely detects memorization by examining neuron activations within the LLM.
arXiv Detail & Related papers (2024-12-02T00:17:43Z) - Assessing Code Generation with Intermediate Languages [6.999311675957218]
This study explores the utilization of intermediate languages, including various programming languages, natural language solutions, and pseudo-code.
Our findings reveal that intermediate languages generally exhibit greater efficacy in larger models that have not yet achieved state-of-the-art performance.
arXiv Detail & Related papers (2024-07-07T15:35:41Z) - Causal Estimation of Memorisation Profiles [58.20086589761273]
Understanding memorisation in language models has practical and societal implications.
Memorisation is the causal effect of training with an instance on the model's ability to predict that instance.
This paper proposes a new, principled, and efficient method to estimate memorisation based on the difference-in-differences design from econometrics.
arXiv Detail & Related papers (2024-06-06T17:59:09Z) - In-Memory Learning: A Declarative Learning Framework for Large Language
Models [56.62616975119192]
We propose a novel learning framework that allows agents to align with their environment without relying on human-labeled data.
This entire process transpires within the memory components and is implemented through natural language.
We demonstrate the effectiveness of our framework and provide insights into this problem.
arXiv Detail & Related papers (2024-03-05T08:25:11Z) - Zero-Resource Hallucination Prevention for Large Language Models [45.4155729393135]
"Hallucination" refers to instances where large language models (LLMs) generate factually inaccurate or ungrounded information.
We introduce a novel pre-language self-evaluation technique, referred to as SELF-FAMILIARITY, which focuses on evaluating the model's familiarity with the concepts present in the input instruction.
We validate SELF-FAMILIARITY across four different large language models, demonstrating consistently superior performance compared to existing techniques.
arXiv Detail & Related papers (2023-09-06T01:57:36Z) - Retentive or Forgetful? Diving into the Knowledge Memorizing Mechanism
of Language Models [49.39276272693035]
Large-scale pre-trained language models have shown remarkable memorizing ability.
Vanilla neural networks without pre-training have been long observed suffering from the catastrophic forgetting problem.
We find that 1) Vanilla language models are forgetful; 2) Pre-training leads to retentive language models; 3) Knowledge relevance and diversification significantly influence the memory formation.
arXiv Detail & Related papers (2023-05-16T03:50:38Z) - On Robustness of Prompt-based Semantic Parsing with Large Pre-trained
Language Model: An Empirical Study on Codex [48.588772371355816]
This paper presents the first empirical study on the adversarial robustness of a large prompt-based language model of code, codex.
Our results demonstrate that the state-of-the-art (SOTA) code-language models are vulnerable to carefully crafted adversarial examples.
arXiv Detail & Related papers (2023-01-30T13:21:00Z) - Large Language Models Are Latent Variable Models: Explaining and Finding
Good Demonstrations for In-Context Learning [104.58874584354787]
In recent years, pre-trained large language models (LLMs) have demonstrated remarkable efficiency in achieving an inference-time few-shot learning capability known as in-context learning.
This study aims to examine the in-context learning phenomenon through a Bayesian lens, viewing real-world LLMs as latent variable models.
arXiv Detail & Related papers (2023-01-27T18:59:01Z) - Improving Temporal Generalization of Pre-trained Language Models with
Lexical Semantic Change [28.106524698188675]
Recent research has revealed that neural language models at scale suffer from poor temporal generalization capability.
We propose a simple yet effective lexical-level masking strategy to post-train a converged language model.
arXiv Detail & Related papers (2022-10-31T08:12:41Z) - Training Language Models with Memory Augmentation [28.4608705738799]
We present a novel training approach designed for training language models with memory augmentation.
Our approach uses a training objective that directly takes in-batch examples as accessible memory.
We demonstrate significant gains over previous memory-augmented approaches.
arXiv Detail & Related papers (2022-05-25T11:37:29Z) - Towards Lifelong Learning of Multilingual Text-To-Speech Synthesis [87.75833205560406]
This work presents a lifelong learning approach to train a multilingual Text-To-Speech (TTS) system.
It does not require pooled data from all languages altogether, and thus alleviates the storage and computation burden.
arXiv Detail & Related papers (2021-10-09T07:00:38Z) - Pre-trained Language Model Based Active Learning for Sentence Matching [18.48335957524662]
We propose a pre-trained language model based active learning approach for sentence matching.
Our approach can achieve greater accuracy with fewer labeled training instances.
arXiv Detail & Related papers (2020-10-12T08:24:36Z) - Exploring Fine-tuning Techniques for Pre-trained Cross-lingual Models
via Continual Learning [74.25168207651376]
Fine-tuning pre-trained language models to downstream cross-lingual tasks has shown promising results.
We leverage continual learning to preserve the cross-lingual ability of the pre-trained model when we fine-tune it to downstream tasks.
Our methods achieve better performance than other fine-tuning baselines on the zero-shot cross-lingual part-of-speech tagging and named entity recognition tasks.
arXiv Detail & Related papers (2020-04-29T14:07:18Z) - Data Augmentation for Spoken Language Understanding via Pretrained
Language Models [113.56329266325902]
Training of spoken language understanding (SLU) models often faces the problem of data scarcity.
We put forward a data augmentation method using pretrained language models to boost the variability and accuracy of generated utterances.
arXiv Detail & Related papers (2020-04-29T04:07:12Z) - Data Annealing for Informal Language Understanding Tasks [66.2988222278475]
We propose a data annealing transfer learning procedure to bridge the performance gap on informal language tasks.
It successfully utilizes a pre-trained model such as BERT in informal language.
arXiv Detail & Related papers (2020-04-24T09:27:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.