A Computational Model of Early Word Learning from the Infant's Point of
View
- URL: http://arxiv.org/abs/2006.02802v1
- Date: Thu, 4 Jun 2020 12:08:44 GMT
- Title: A Computational Model of Early Word Learning from the Infant's Point of
View
- Authors: Satoshi Tsutsui, Arjun Chandrasekaran, Md Alimoor Reza, David
Crandall, Chen Yu
- Abstract summary: The present study uses egocentric video and gaze data collected from infant learners during natural toy play with their parents.
We then used a Convolutional Neural Network (CNN) model to process sensory data from the infant's point of view and learn name-object associations from scratch.
As the first model that takes raw egocentric video to simulate infant word learning, the present study provides a proof of principle that the problem of early word learning can be solved.
- Score: 15.443815646555125
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Human infants have the remarkable ability to learn the associations between
object names and visual objects from inherently ambiguous experiences.
Researchers in cognitive science and developmental psychology have built formal
models that implement in-principle learning algorithms, and then used
pre-selected and pre-cleaned datasets to test the abilities of the models to
find statistical regularities in the input data. In contrast to previous
modeling approaches, the present study used egocentric video and gaze data
collected from infant learners during natural toy play with their parents. This
allowed us to capture the learning environment from the perspective of the
learner's own point of view. We then used a Convolutional Neural Network (CNN)
model to process sensory data from the infant's point of view and learn
name-object associations from scratch. As the first model that takes raw
egocentric video to simulate infant word learning, the present study provides a
proof of principle that the problem of early word learning can be solved, using
actual visual data perceived by infant learners. Moreover, we conducted
simulation experiments to systematically determine how visual, perceptual, and
attentional properties of infants' sensory experiences may affect word
learning.
Related papers
- Discovering Hidden Visual Concepts Beyond Linguistic Input in Infant Learning [18.43931715859825]
As computer vision seeks to replicate the human vision system, understanding infant visual development may offer valuable insights.
We introduce a training-free framework that can discover visual concept neurons hidden in the model's internal representations.
Our work bridges cognitive science and computer vision by analyzing the internal representations of a computational model trained on an infant's visual and linguistic inputs.
arXiv Detail & Related papers (2025-01-09T12:55:55Z) - Developmental Predictive Coding Model for Early Infancy Mono and Bilingual Vocal Continual Learning [69.8008228833895]
We propose a small-sized generative neural network equipped with a continual learning mechanism.
Our model prioritizes interpretability and demonstrates the advantages of online learning.
arXiv Detail & Related papers (2024-12-23T10:23:47Z) - Neural Lineage [56.34149480207817]
We introduce a novel task known as neural lineage detection, aiming at discovering lineage relationships between parent and child models.
For practical convenience, we introduce a learning-free approach, which integrates an approximation of the finetuning process into the neural network representation similarity metrics.
For the pursuit of accuracy, we introduce a learning-based lineage detector comprising encoders and a transformer detector.
arXiv Detail & Related papers (2024-06-17T01:11:53Z) - A model of early word acquisition based on realistic-scale audiovisual naming events [10.047470656294333]
We studied the extent to which early words can be acquired through statistical learning from regularities in audiovisual sensory input.
We simulated word learning in infants up to 12 months of age in a realistic setting, using a model that learns from statistical regularities in raw speech and pixel-level visual input.
Results show that the model effectively learns to recognize words and associate them with corresponding visual objects, with a vocabulary growth rate comparable to that observed in infants.
arXiv Detail & Related papers (2024-06-07T21:05:59Z) - Self-supervised learning of video representations from a child's perspective [27.439294457852423]
Children learn powerful internal models of the world around them from a few years of egocentric visual experience.
Can such internal models be learned from a child's visual experience with highly generic learning algorithms or do they require strong inductive biases?
arXiv Detail & Related papers (2024-02-01T03:27:26Z) - Visual Grounding Helps Learn Word Meanings in Low-Data Regimes [47.7950860342515]
Modern neural language models (LMs) are powerful tools for modeling human sentence production and comprehension.
But to achieve these results, LMs must be trained in distinctly un-human-like ways.
Do models trained more naturalistically -- with grounded supervision -- exhibit more humanlike language learning?
We investigate this question in the context of word learning, a key sub-task in language acquisition.
arXiv Detail & Related papers (2023-10-20T03:33:36Z) - SINC: Self-Supervised In-Context Learning for Vision-Language Tasks [64.44336003123102]
We propose a framework to enable in-context learning in large language models.
A meta-model can learn on self-supervised prompts consisting of tailored demonstrations.
Experiments show that SINC outperforms gradient-based methods in various vision-language tasks.
arXiv Detail & Related papers (2023-07-15T08:33:08Z) - A Visuospatial Dataset for Naturalistic Verb Learning [18.654373173232205]
We introduce a new dataset for training and evaluating grounded language models.
Our data is collected within a virtual reality environment and is designed to emulate the quality of language data to which a pre-verbal child is likely to have access.
We use the collected data to compare several distributional semantics models for verb learning.
arXiv Detail & Related papers (2020-10-28T20:47:13Z) - What Can You Learn from Your Muscles? Learning Visual Representation
from Human Interactions [50.435861435121915]
We use human interaction and attention cues to investigate whether we can learn better representations compared to visual-only representations.
Our experiments show that our "muscly-supervised" representation outperforms a visual-only state-of-the-art method MoCo.
arXiv Detail & Related papers (2020-10-16T17:46:53Z) - Evaluating computational models of infant phonetic learning across
languages [31.587496924289972]
In the first year of life, infants' speech perception becomes attuned to the sounds of their native language.
Many accounts of this early phonetic learning exist, but computational models predicting the patterns observed in infants from the speech input they hear have been lacking.
Here we study five such algorithms, selected for their potential cognitive relevance. We simulate phonetic learning with each algorithm and perform tests on three phone contrasts from different languages, comparing the results to infants' discrimination patterns.
arXiv Detail & Related papers (2020-08-06T22:07:45Z) - A Developmental Neuro-Robotics Approach for Boosting the Recognition of
Handwritten Digits [91.3755431537592]
Recent evidence shows that a simulation of the children's embodied strategies can improve the machine intelligence too.
This article explores the application of embodied strategies to convolutional neural network models in the context of developmental neuro-robotics.
arXiv Detail & Related papers (2020-03-23T14:55:00Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.