Investigating Critical Period Effects in Language Acquisition through Neural Language Models
- URL: http://arxiv.org/abs/2407.19325v2
- Date: Sun, 6 Oct 2024 20:23:15 GMT
- Title: Investigating Critical Period Effects in Language Acquisition through Neural Language Models
- Authors: Ionut Constantinescu, Tiago Pimentel, Ryan Cotterell, Alex Warstadt,
- Abstract summary: Second language (L2) acquisition becomes harder after early childhood.
ceasing exposure to a first language (L1) after this period (but not before) typically does not lead to substantial loss of L1 proficiency.
It is unknown whether these CP effects result from innately determined brain maturation or as a stabilization of neural connections naturally induced by experience.
- Score: 70.6367059367609
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Humans appear to have a critical period (CP) for language acquisition: Second language (L2) acquisition becomes harder after early childhood, and ceasing exposure to a first language (L1) after this period (but not before) typically does not lead to substantial loss of L1 proficiency. It is unknown whether these CP effects result from innately determined brain maturation or as a stabilization of neural connections naturally induced by experience. In this study, we use language models (LMs) to test the extent to which these phenomena are peculiar to humans, or shared by a broader class of language learners. We vary the age of exposure by training LMs on language pairs in various experimental conditions, and find that LMs, which lack any direct analog to innate maturational stages, do not show CP effects when the age of exposure of L2 is delayed. Our results contradict the claim that CP effects are an inevitable result of statistical learning, and they are consistent with an innate mechanism for CP effects. We show that we can reverse-engineer the CP by introducing a regularizer partway through training to simulate a maturational decrease in plasticity. All in all, our results suggest that L1 learning on its own may not be enough to induce a CP, and additional engineering is necessary to make language models more cognitively plausible.
Related papers
- Is In-Context Learning a Type of Gradient-Based Learning? Evidence from the Inverse Frequency Effect in Structural Priming [6.408190458163885]
Large language models (LLMs) have shown the emergent capability of in-context learning (ICL)
We introduce a new way of diagnosing whether ICL is functionally equivalent to gradient-based learning.
arXiv Detail & Related papers (2024-06-26T17:06:41Z) - InstructionCP: A fast approach to transfer Large Language Models into target language [55.2480439325792]
InsCP integrates instruction tags into the CP process to prevent loss of conversational proficiency while acquiring new languages.
Our experiments demonstrate that InsCP retains conversational and Reinforcement Learning from Human Feedback abilities.
This approach requires only 0.1 billion tokens of high-quality instruction-following data, thereby reducing resource consumption.
arXiv Detail & Related papers (2024-05-30T15:45:13Z) - Revealing the Parallel Multilingual Learning within Large Language Models [50.098518799536144]
In this study, we reveal an in-context learning capability of multilingual large language models (LLMs)
By translating the input to several languages, we provide Parallel Input in Multiple Languages (PiM) to LLMs, which significantly enhances their comprehension abilities.
arXiv Detail & Related papers (2024-03-14T03:33:46Z) - Alleviating Hallucinations of Large Language Models through Induced
Hallucinations [67.35512483340837]
Large language models (LLMs) have been observed to generate responses that include inaccurate or fabricated information.
We propose a simple textitInduce-then-Contrast Decoding (ICD) strategy to alleviate hallucinations.
arXiv Detail & Related papers (2023-12-25T12:32:49Z) - The Hydra Effect: Emergent Self-repair in Language Model Computations [8.323441767835257]
We investigate the internal structure of language model computations using causal analysis.
We show two motifs: (1) a form of adaptive computation where ablations of one attention layer of a language model cause another layer to cause another layer.
We analyse these effects in the context of factual recall and consider their implications for circuit-level attribution in language models.
arXiv Detail & Related papers (2023-07-28T19:13:26Z) - Neural Language Models are not Born Equal to Fit Brain Data, but
Training Helps [75.84770193489639]
We examine the impact of test loss, training corpus and model architecture on the prediction of functional Magnetic Resonance Imaging timecourses of participants listening to an audiobook.
We find that untrained versions of each model already explain significant amount of signal in the brain by capturing similarity in brain responses across identical words.
We suggest good practices for future studies aiming at explaining the human language system using neural language models.
arXiv Detail & Related papers (2022-07-07T15:37:17Z) - Is the Language Familiarity Effect gradual? A computational modelling
approach [14.83230292969134]
We show that a model of the Language Familiarity Effect can be used to obtain a gradual measure of the effect.
We show that the effect is replicated across a wide array of languages, providing further evidence of its universality.
Building on the gradual measure of LFE, we also show that languages belonging to the same family yield scores, supporting the idea of an effect of language distance on LFE.
arXiv Detail & Related papers (2022-06-27T16:08:42Z) - Is the Computation of Abstract Sameness Relations Human-Like in Neural
Language Models? [4.0810783261728565]
This work explores whether state-of-the-art NLP models exhibit elementary mechanisms known from human cognition.
The computation of "abstract sameness relations" is assumed to play an important role in human language acquisition and processing.
arXiv Detail & Related papers (2022-05-12T15:19:54Z) - A bifurcation threshold for contact-induced language change [0.0]
This paper proposes a mathematical model of such situations based on reinforcement learning and nonlinear dynamics.
The model is evaluated with the help of two case studies, morphological levelling in Afrikaans and the erosion of null subjects in Afro-Peruvian Spanish.
arXiv Detail & Related papers (2021-11-23T18:21:12Z) - Is Supervised Syntactic Parsing Beneficial for Language Understanding?
An Empirical Investigation [71.70562795158625]
Traditional NLP has long held (supervised) syntactic parsing necessary for successful higher-level semantic language understanding (LU)
Recent advent of end-to-end neural models, self-supervised via language modeling (LM), and their success on a wide range of LU tasks, questions this belief.
We empirically investigate the usefulness of supervised parsing for semantic LU in the context of LM-pretrained transformer networks.
arXiv Detail & Related papers (2020-08-15T21:03:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.