Using Known Words to Learn More Words: A Distributional Analysis of
Child Vocabulary Development
- URL: http://arxiv.org/abs/2009.06810v2
- Date: Sun, 21 Nov 2021 20:11:44 GMT
- Title: Using Known Words to Learn More Words: A Distributional Analysis of
Child Vocabulary Development
- Authors: Andrew Z. Flores, Jessica Montag, Jon Willits
- Abstract summary: We investigated item-based variability in vocabulary development using lexical properties of distributional statistics.
We predicted word trajectories cross-sectionally, shedding light on trends in vocabulary development that may not have been evident at a single time point.
We also show that whether one looks at a single age group or across ages as a whole, the best distributional predictor of whether a child knows a word is the number of other known words with which that word tends to co-occur.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Why do children learn some words before others? Understanding individual
variability across children and also variability across words, may be
informative of the learning processes that underlie language learning. We
investigated item-based variability in vocabulary development using lexical
properties of distributional statistics derived from a large corpus of
child-directed speech. Unlike previous analyses, we predicted word trajectories
cross-sectionally, shedding light on trends in vocabulary development that may
not have been evident at a single time point. We also show that whether one
looks at a single age group or across ages as a whole, the best distributional
predictor of whether a child knows a word is the number of other known words
with which that word tends to co-occur. Keywords: age of acquisition;
vocabulary development; lexical diversity; child-directed speech;
Related papers
- A model of early word acquisition based on realistic-scale audiovisual naming events [10.047470656294333]
We studied the extent to which early words can be acquired through statistical learning from regularities in audiovisual sensory input.
We simulated word learning in infants up to 12 months of age in a realistic setting, using a model that learns from statistical regularities in raw speech and pixel-level visual input.
Results show that the model effectively learns to recognize words and associate them with corresponding visual objects, with a vocabulary growth rate comparable to that observed in infants.
arXiv Detail & Related papers (2024-06-07T21:05:59Z) - Quantifying the redundancy between prosody and text [67.07817268372743]
We use large language models to estimate how much information is redundant between prosody and the words themselves.
We find a high degree of redundancy between the information carried by the words and prosodic information across several prosodic features.
Still, we observe that prosodic features can not be fully predicted from text, suggesting that prosody carries information above and beyond the words.
arXiv Detail & Related papers (2023-11-28T21:15:24Z) - Towards Open Vocabulary Learning: A Survey [146.90188069113213]
Deep neural networks have made impressive advancements in various core tasks like segmentation, tracking, and detection.
Recently, open vocabulary settings were proposed due to the rapid progress of vision language pre-training.
This paper provides a thorough review of open vocabulary learning, summarizing and analyzing recent developments in the field.
arXiv Detail & Related papers (2023-06-28T02:33:06Z) - MEWL: Few-shot multimodal word learning with referential uncertainty [24.94171567232573]
We introduce the MachinE Word Learning benchmark to assess how machines learn word meaning in grounded visual scenes.
MEWL covers human's core cognitive toolkits in word learning: cross-situational reasoning, bootstrapping, and pragmatic learning.
By evaluating multimodal and unimodal agents' performance with a comparative analysis of human performance, we notice a sharp divergence in human and machine word learning.
arXiv Detail & Related papers (2023-06-01T09:54:31Z) - Neighboring Words Affect Human Interpretation of Saliency Explanations [65.29015910991261]
Word-level saliency explanations are often used to communicate feature-attribution in text-based models.
Recent studies found that superficial factors such as word length can distort human interpretation of the communicated saliency scores.
We investigate how the marking of a word's neighboring words affect the explainee's perception of the word's importance in the context of a saliency explanation.
arXiv Detail & Related papers (2023-05-04T09:50:25Z) - Feature-rich multiplex lexical networks reveal mental strategies of
early language learning [0.7111443975103329]
We introduce FEature-Rich MUltiplex LEXical (FERMULEX) networks.
Similarities model heterogenous word associations across semantic/syntactic/phonological aspects of knowledge.
Words are enriched with multi-dimensional feature embeddings including frequency, age of acquisition, length and polysemy.
arXiv Detail & Related papers (2022-01-13T16:44:51Z) - Word Acquisition in Neural Language Models [0.38073142980733]
We investigate how neural language models acquire individual words during training, extracting learning curves and ages of acquisition for over 600 words.
We find that the effects of concreteness, word length, and lexical class are pointedly different in children and language models.
arXiv Detail & Related papers (2021-10-05T23:26:16Z) - Using Diachronic Distributed Word Representations as Models of Lexical
Development in Children [0.0]
We use diachronic distributed word representations to perform temporal modeling and analysis of lexical development in children.
We demonstrate the dynamics of growing lexical knowledge in children over time, as compared against a saturated level of lexical knowledge in child-directed adult speech.
arXiv Detail & Related papers (2021-05-11T14:44:05Z) - Disambiguatory Signals are Stronger in Word-initial Positions [48.18148856974974]
We point out the confounds in existing methods for comparing the informativeness of segments early in the word versus later in the word.
We find evidence across hundreds of languages that indeed there is a cross-linguistic tendency to front-load information in words.
arXiv Detail & Related papers (2021-02-03T18:19:16Z) - Analyzing the Surprising Variability in Word Embedding Stability Across
Languages [46.84861591608146]
We discuss linguistic properties that are related to stability, drawing out insights about correlations with affixing, language gender systems, and other features.
This has implications for embedding use, particularly in research that uses them to study language trends.
arXiv Detail & Related papers (2020-04-30T15:24:43Z) - Where New Words Are Born: Distributional Semantic Analysis of Neologisms
and Their Semantic Neighborhoods [51.34667808471513]
We investigate the importance of two factors, semantic sparsity and frequency growth rates of semantic neighbors, formalized in the distributional semantics paradigm.
We show that both factors are predictive word emergence although we find more support for the latter hypothesis.
arXiv Detail & Related papers (2020-01-21T19:09:49Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.