On the robustness of modeling grounded word learning through a child's egocentric input
- URL: http://arxiv.org/abs/2507.14749v1
- Date: Sat, 19 Jul 2025 20:55:37 GMT
- Title: On the robustness of modeling grounded word learning through a child's egocentric input
- Authors: Wai Keen Vong, Brenden M. Lake,
- Abstract summary: We show that multimodal neural networks trained on automatically transcribed data from each child can acquire and generalize word-referent mappings across multiple network architectures.<n>Results validate the robustness of multimodal neural networks for grounded word learning.
- Score: 9.62675241698235
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: What insights can machine learning bring to understanding human language acquisition? Large language and multimodal models have achieved remarkable capabilities, but their reliance on massive training datasets creates a fundamental mismatch with children, who succeed in acquiring language from comparatively limited input. To help bridge this gap, researchers have increasingly trained neural networks using data similar in quantity and quality to children's input. Taking this approach to the limit, Vong et al. (2024) showed that a multimodal neural network trained on 61 hours of visual and linguistic input extracted from just one child's developmental experience could acquire word-referent mappings. However, whether this approach's success reflects the idiosyncrasies of a single child's experience, or whether it would show consistent and robust learning patterns across multiple children's experiences was not explored. In this article, we applied automated speech transcription methods to the entirety of the SAYCam dataset, consisting of over 500 hours of video data spread across all three children. Using these automated transcriptions, we generated multi-modal vision-and-language datasets for both training and evaluation, and explored a range of neural network configurations to examine the robustness of simulated word learning. Our findings demonstrate that networks trained on automatically transcribed data from each child can acquire and generalize word-referent mappings across multiple network architectures. These results validate the robustness of multimodal neural networks for grounded word learning, while highlighting the individual differences that emerge in how models learn when trained on each child's developmental experiences.
Related papers
- Spatio-Temporal Graph Neural Networks for Infant Language Acquisition Prediction [0.0]
A model of language acquisition for infants and young children can be constructed and adapted for use in a Spatio-Temporal Graph Convolutional Network (STGCN)<n>We introduce a novel approach for predicting child vocabulary acquisition, and evaluate the efficacy of such a model with respect to the different types of linguistic relationships that occur during language acquisition.
arXiv Detail & Related papers (2025-03-18T15:21:27Z) - Developmental Predictive Coding Model for Early Infancy Mono and Bilingual Vocal Continual Learning [69.8008228833895]
We propose a small-sized generative neural network equipped with a continual learning mechanism.<n>Our model prioritizes interpretability and demonstrates the advantages of online learning.
arXiv Detail & Related papers (2024-12-23T10:23:47Z) - An iterated learning model of language change that mixes supervised and unsupervised learning [0.0]
The iterated learning model is an agent model which simulates the transmission of of language from generation to generation.<n>In each iteration, a language tutor exposes a na"ive pupil to a limited training set of utterances, each pairing a random meaning with the signal that conveys it.<n>The transmission bottleneck ensures that tutors must generalize beyond the training set that they experienced.
arXiv Detail & Related papers (2024-05-31T14:14:01Z) - A systematic investigation of learnability from single child linguistic input [12.279543223376935]
Language models (LMs) have demonstrated remarkable proficiency in generating linguistically coherent text.
However, a significant gap exists between the training data for these models and the linguistic input a child receives.
Our research focuses on training LMs on subsets of a single child's linguistic input.
arXiv Detail & Related papers (2024-02-12T18:58:58Z) - Visual Grounding Helps Learn Word Meanings in Low-Data Regimes [47.7950860342515]
Modern neural language models (LMs) are powerful tools for modeling human sentence production and comprehension.
But to achieve these results, LMs must be trained in distinctly un-human-like ways.
Do models trained more naturalistically -- with grounded supervision -- exhibit more humanlike language learning?
We investigate this question in the context of word learning, a key sub-task in language acquisition.
arXiv Detail & Related papers (2023-10-20T03:33:36Z) - SINC: Self-Supervised In-Context Learning for Vision-Language Tasks [64.44336003123102]
We propose a framework to enable in-context learning in large language models.
A meta-model can learn on self-supervised prompts consisting of tailored demonstrations.
Experiments show that SINC outperforms gradient-based methods in various vision-language tasks.
arXiv Detail & Related papers (2023-07-15T08:33:08Z) - What Artificial Neural Networks Can Tell Us About Human Language
Acquisition [47.761188531404066]
Rapid progress in machine learning for natural language processing has the potential to transform debates about how humans learn language.
To increase the relevance of learnability results from computational models, we need to train model learners without significant advantages over humans.
arXiv Detail & Related papers (2022-08-17T00:12:37Z) - Dependency-based Mixture Language Models [53.152011258252315]
We introduce the Dependency-based Mixture Language Models.
In detail, we first train neural language models with a novel dependency modeling objective.
We then formulate the next-token probability by mixing the previous dependency modeling probability distributions with self-attention.
arXiv Detail & Related papers (2022-03-19T06:28:30Z) - Word Acquisition in Neural Language Models [0.38073142980733]
We investigate how neural language models acquire individual words during training, extracting learning curves and ages of acquisition for over 600 words.
We find that the effects of concreteness, word length, and lexical class are pointedly different in children and language models.
arXiv Detail & Related papers (2021-10-05T23:26:16Z) - What Matters in Learning from Offline Human Demonstrations for Robot
Manipulation [64.43440450794495]
We conduct an extensive study of six offline learning algorithms for robot manipulation.
Our study analyzes the most critical challenges when learning from offline human data.
We highlight opportunities for learning from human datasets.
arXiv Detail & Related papers (2021-08-06T20:48:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.