Evaluating computational models of infant phonetic learning across
languages
- URL: http://arxiv.org/abs/2008.02888v1
- Date: Thu, 6 Aug 2020 22:07:45 GMT
- Title: Evaluating computational models of infant phonetic learning across
languages
- Authors: Yevgen Matusevych, Thomas Schatz, Herman Kamper, Naomi H. Feldman,
Sharon Goldwater
- Abstract summary: In the first year of life, infants' speech perception becomes attuned to the sounds of their native language.
Many accounts of this early phonetic learning exist, but computational models predicting the patterns observed in infants from the speech input they hear have been lacking.
Here we study five such algorithms, selected for their potential cognitive relevance. We simulate phonetic learning with each algorithm and perform tests on three phone contrasts from different languages, comparing the results to infants' discrimination patterns.
- Score: 31.587496924289972
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: In the first year of life, infants' speech perception becomes attuned to the
sounds of their native language. Many accounts of this early phonetic learning
exist, but computational models predicting the attunement patterns observed in
infants from the speech input they hear have been lacking. A recent study
presented the first such model, drawing on algorithms proposed for unsupervised
learning from naturalistic speech, and tested it on a single phone contrast.
Here we study five such algorithms, selected for their potential cognitive
relevance. We simulate phonetic learning with each algorithm and perform tests
on three phone contrasts from different languages, comparing the results to
infants' discrimination patterns. The five models display varying degrees of
agreement with empirical observations, showing that our approach can help
decide between candidate mechanisms for early phonetic learning, and providing
insight into which aspects of the models are critical for capturing infants'
perceptual development.
Related papers
- The formation of perceptual space in early phonetic acquisition: a cross-linguistic modeling approach [0.0]
This study investigates how learners organize perceptual space in early phonetic acquisition.
It examines the shape of the learned hidden representation as well as its ability to categorize phonetic categories.
arXiv Detail & Related papers (2024-07-26T04:18:36Z) - A model of early word acquisition based on realistic-scale audiovisual naming events [10.047470656294333]
We studied the extent to which early words can be acquired through statistical learning from regularities in audiovisual sensory input.
We simulated word learning in infants up to 12 months of age in a realistic setting, using a model that learns from statistical regularities in raw speech and pixel-level visual input.
Results show that the model effectively learns to recognize words and associate them with corresponding visual objects, with a vocabulary growth rate comparable to that observed in infants.
arXiv Detail & Related papers (2024-06-07T21:05:59Z) - Pixel Sentence Representation Learning [67.4775296225521]
In this work, we conceptualize the learning of sentence-level textual semantics as a visual representation learning process.
We employ visually-grounded text perturbation methods like typos and word order shuffling, resonating with human cognitive patterns, and enabling perturbation to be perceived as continuous.
Our approach is further bolstered by large-scale unsupervised topical alignment training and natural language inference supervision.
arXiv Detail & Related papers (2024-02-13T02:46:45Z) - BabySLM: language-acquisition-friendly benchmark of self-supervised
spoken language models [56.93604813379634]
Self-supervised techniques for learning speech representations have been shown to develop linguistic competence from exposure to speech without the need for human labels.
We propose a language-acquisition-friendly benchmark to probe spoken language models at the lexical and syntactic levels.
We highlight two exciting challenges that need to be addressed for further progress: bridging the gap between text and speech and between clean speech and in-the-wild speech.
arXiv Detail & Related papers (2023-06-02T12:54:38Z) - Toward a realistic model of speech processing in the brain with
self-supervised learning [67.7130239674153]
Self-supervised algorithms trained on the raw waveform constitute a promising candidate.
We show that Wav2Vec 2.0 learns brain-like representations with as little as 600 hours of unlabelled speech.
arXiv Detail & Related papers (2022-06-03T17:01:46Z) - Self-Supervised Speech Representation Learning: A Review [105.1545308184483]
Self-supervised representation learning methods promise a single universal model that would benefit a wide variety of tasks and domains.
Speech representation learning is experiencing similar progress in three main categories: generative, contrastive, and predictive methods.
This review presents approaches for self-supervised speech representation learning and their connection to other research areas.
arXiv Detail & Related papers (2022-05-21T16:52:57Z) - Perception Point: Identifying Critical Learning Periods in Speech for
Bilingual Networks [58.24134321728942]
We compare and identify cognitive aspects on deep neural-based visual lip-reading models.
We observe a strong correlation between these theories in cognitive psychology and our unique modeling.
arXiv Detail & Related papers (2021-10-13T05:30:50Z) - Word Acquisition in Neural Language Models [0.38073142980733]
We investigate how neural language models acquire individual words during training, extracting learning curves and ages of acquisition for over 600 words.
We find that the effects of concreteness, word length, and lexical class are pointedly different in children and language models.
arXiv Detail & Related papers (2021-10-05T23:26:16Z) - "Notic My Speech" -- Blending Speech Patterns With Multimedia [65.91370924641862]
We propose a view-temporal attention mechanism to model both the view dependence and the visemic importance in speech recognition and understanding.
Our proposed method outperformed the existing work by 4.99% in terms of the viseme error rate.
We show that there is a strong correlation between our model's understanding of multi-view speech and the human perception.
arXiv Detail & Related papers (2020-06-12T06:51:55Z) - A Computational Model of Early Word Learning from the Infant's Point of
View [15.443815646555125]
The present study uses egocentric video and gaze data collected from infant learners during natural toy play with their parents.
We then used a Convolutional Neural Network (CNN) model to process sensory data from the infant's point of view and learn name-object associations from scratch.
As the first model that takes raw egocentric video to simulate infant word learning, the present study provides a proof of principle that the problem of early word learning can be solved.
arXiv Detail & Related papers (2020-06-04T12:08:44Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.