A model of early word acquisition based on realistic-scale audiovisual naming events
- URL: http://arxiv.org/abs/2406.05259v1
- Date: Fri, 7 Jun 2024 21:05:59 GMT
- Title: A model of early word acquisition based on realistic-scale audiovisual naming events
- Authors: Khazar Khorrami, Okko Räsänen,
- Abstract summary: We studied the extent to which early words can be acquired through statistical learning from regularities in audiovisual sensory input.
We simulated word learning in infants up to 12 months of age in a realistic setting, using a model that learns from statistical regularities in raw speech and pixel-level visual input.
Results show that the model effectively learns to recognize words and associate them with corresponding visual objects, with a vocabulary growth rate comparable to that observed in infants.
- Score: 10.047470656294333
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Infants gradually learn to parse continuous speech into words and connect names with objects, yet the mechanisms behind development of early word perception skills remain unknown. We studied the extent to which early words can be acquired through statistical learning from regularities in audiovisual sensory input. We simulated word learning in infants up to 12 months of age in a realistic setting, using a model that solely learns from statistical regularities in unannotated raw speech and pixel-level visual input. Crucially, the quantity of object naming events was carefully designed to match that accessible to infants of comparable ages. Results show that the model effectively learns to recognize words and associate them with corresponding visual objects, with a vocabulary growth rate comparable to that observed in infants. The findings support the viability of general statistical learning for early word perception, demonstrating how learning can operate without assuming any prior linguistic capabilities.
Related papers
- Pixel Sentence Representation Learning [67.4775296225521]
In this work, we conceptualize the learning of sentence-level textual semantics as a visual representation learning process.
We employ visually-grounded text perturbation methods like typos and word order shuffling, resonating with human cognitive patterns, and enabling perturbation to be perceived as continuous.
Our approach is further bolstered by large-scale unsupervised topical alignment training and natural language inference supervision.
arXiv Detail & Related papers (2024-02-13T02:46:45Z) - Caregiver Talk Shapes Toddler Vision: A Computational Study of Dyadic
Play [8.164232628099619]
We propose a computational model of visual representation learning during dyadic play.
We show that utterances with statistics matching those of real caregivers give rise to representations supporting improved category recognition.
arXiv Detail & Related papers (2023-12-07T08:18:40Z) - Visual Grounding Helps Learn Word Meanings in Low-Data Regimes [47.7950860342515]
Modern neural language models (LMs) are powerful tools for modeling human sentence production and comprehension.
But to achieve these results, LMs must be trained in distinctly un-human-like ways.
Do models trained more naturalistically -- with grounded supervision -- exhibit more humanlike language learning?
We investigate this question in the context of word learning, a key sub-task in language acquisition.
arXiv Detail & Related papers (2023-10-20T03:33:36Z) - BabySLM: language-acquisition-friendly benchmark of self-supervised
spoken language models [56.93604813379634]
Self-supervised techniques for learning speech representations have been shown to develop linguistic competence from exposure to speech without the need for human labels.
We propose a language-acquisition-friendly benchmark to probe spoken language models at the lexical and syntactic levels.
We highlight two exciting challenges that need to be addressed for further progress: bridging the gap between text and speech and between clean speech and in-the-wild speech.
arXiv Detail & Related papers (2023-06-02T12:54:38Z) - Self-Supervised Speech Representation Learning: A Review [105.1545308184483]
Self-supervised representation learning methods promise a single universal model that would benefit a wide variety of tasks and domains.
Speech representation learning is experiencing similar progress in three main categories: generative, contrastive, and predictive methods.
This review presents approaches for self-supervised speech representation learning and their connection to other research areas.
arXiv Detail & Related papers (2022-05-21T16:52:57Z) - Perception Point: Identifying Critical Learning Periods in Speech for
Bilingual Networks [58.24134321728942]
We compare and identify cognitive aspects on deep neural-based visual lip-reading models.
We observe a strong correlation between these theories in cognitive psychology and our unique modeling.
arXiv Detail & Related papers (2021-10-13T05:30:50Z) - Word Acquisition in Neural Language Models [0.38073142980733]
We investigate how neural language models acquire individual words during training, extracting learning curves and ages of acquisition for over 600 words.
We find that the effects of concreteness, word length, and lexical class are pointedly different in children and language models.
arXiv Detail & Related papers (2021-10-05T23:26:16Z) - Using Diachronic Distributed Word Representations as Models of Lexical
Development in Children [0.0]
We use diachronic distributed word representations to perform temporal modeling and analysis of lexical development in children.
We demonstrate the dynamics of growing lexical knowledge in children over time, as compared against a saturated level of lexical knowledge in child-directed adult speech.
arXiv Detail & Related papers (2021-05-11T14:44:05Z) - Using Known Words to Learn More Words: A Distributional Analysis of
Child Vocabulary Development [0.0]
We investigated item-based variability in vocabulary development using lexical properties of distributional statistics.
We predicted word trajectories cross-sectionally, shedding light on trends in vocabulary development that may not have been evident at a single time point.
We also show that whether one looks at a single age group or across ages as a whole, the best distributional predictor of whether a child knows a word is the number of other known words with which that word tends to co-occur.
arXiv Detail & Related papers (2020-09-15T01:18:21Z) - A Computational Model of Early Word Learning from the Infant's Point of
View [15.443815646555125]
The present study uses egocentric video and gaze data collected from infant learners during natural toy play with their parents.
We then used a Convolutional Neural Network (CNN) model to process sensory data from the infant's point of view and learn name-object associations from scratch.
As the first model that takes raw egocentric video to simulate infant word learning, the present study provides a proof of principle that the problem of early word learning can be solved.
arXiv Detail & Related papers (2020-06-04T12:08:44Z) - On Vocabulary Reliance in Scene Text Recognition [79.21737876442253]
Methods perform well on images with words within vocabulary but generalize poorly to images with words outside vocabulary.
We call this phenomenon "vocabulary reliance"
We propose a simple yet effective mutual learning strategy to allow models of two families to learn collaboratively.
arXiv Detail & Related papers (2020-05-08T11:16:58Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.