Related papers: Perception Point: Identifying Critical Learning Periods in Speech for Bilingual Networks

Perception Point: Identifying Critical Learning Periods in Speech for Bilingual Networks

URL: http://arxiv.org/abs/2110.06507v1
Date: Wed, 13 Oct 2021 05:30:50 GMT
Title: Perception Point: Identifying Critical Learning Periods in Speech for Bilingual Networks
Authors: Anuj Saraswat, Mehar Bhatia, Yaman Kumar Singla, Changyou Chen, Rajiv Ratn Shah
Abstract summary: We compare and identify cognitive aspects on deep neural-based visual lip-reading models. We observe a strong correlation between these theories in cognitive psychology and our unique modeling.
Score: 58.24134321728942
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Recent studies in speech perception have been closely linked to fields of cognitive psychology, phonology, and phonetics in linguistics. During perceptual attunement, a critical and sensitive developmental trajectory has been examined in bilingual and monolingual infants where they can best discriminate common phonemes. In this paper, we compare and identify these cognitive aspects on deep neural-based visual lip-reading models. We conduct experiments on the two most extensive public visual speech recognition datasets for English and Mandarin. Through our experimental results, we observe a strong correlation between these theories in cognitive psychology and our unique modeling. We inspect how these computational models develop similar phases in speech perception and acquisitions.

Related papers

Developmental Predictive Coding Model for Early Infancy Mono and Bilingual Vocal Continual Learning [69.8008228833895]
We propose a small-sized generative neural network equipped with a continual learning mechanism. Our model prioritizes interpretability and demonstrates the advantages of online learning.
arXiv Detail & Related papers (2024-12-23T10:23:47Z)
SIFToM: Robust Spoken Instruction Following through Theory of Mind [51.326266354164716]
We present a cognitively inspired model, Speech Instruction Following through Theory of Mind (SIFToM), to enable robots to pragmatically follow human instructions under diverse speech conditions. Results show that the SIFToM model outperforms state-of-the-art speech and language models, approaching human-level accuracy on challenging speech instruction following tasks.
arXiv Detail & Related papers (2024-09-17T02:36:10Z)
The formation of perceptual space in early phonetic acquisition: a cross-linguistic modeling approach [0.0]
This study investigates how learners organize perceptual space in early phonetic acquisition. It examines the shape of the learned hidden representation as well as its ability to categorize phonetic categories.
arXiv Detail & Related papers (2024-07-26T04:18:36Z)
Perception of Phonological Assimilation by Neural Speech Recognition Models [3.4173734484549625]
This article explores how the neural speech recognition model Wav2Vec2 perceives assimilated sounds. Using psycholinguistic stimuli, we analyze how various linguistic context cues influence compensation patterns in the model's output.
arXiv Detail & Related papers (2024-06-21T15:58:22Z)
Evaluating Speaker Identity Coding in Self-supervised Models and Humans [0.42303492200814446]
Speaker identity plays a significant role in human communication and is being increasingly used in societal applications. We show that self-supervised representations from different families are significantly better for speaker identification over acoustic representations. We also show that such a speaker identification task can be used to better understand the nature of acoustic information representation in different layers of these powerful networks.
arXiv Detail & Related papers (2024-06-14T20:07:21Z)
BabySLM: language-acquisition-friendly benchmark of self-supervised spoken language models [56.93604813379634]
Self-supervised techniques for learning speech representations have been shown to develop linguistic competence from exposure to speech without the need for human labels. We propose a language-acquisition-friendly benchmark to probe spoken language models at the lexical and syntactic levels. We highlight two exciting challenges that need to be addressed for further progress: bridging the gap between text and speech and between clean speech and in-the-wild speech.
arXiv Detail & Related papers (2023-06-02T12:54:38Z)
Toward a realistic model of speech processing in the brain with self-supervised learning [67.7130239674153]
Self-supervised algorithms trained on the raw waveform constitute a promising candidate. We show that Wav2Vec 2.0 learns brain-like representations with as little as 600 hours of unlabelled speech.
arXiv Detail & Related papers (2022-06-03T17:01:46Z)
Model-based analysis of brain activity reveals the hierarchy of language in 305 subjects [82.81964713263483]
A popular approach to decompose the neural bases of language consists in correlating, across individuals, the brain responses to different stimuli. Here, we show that a model-based approach can reach equivalent results within subjects exposed to natural stimuli.
arXiv Detail & Related papers (2021-10-12T15:30:21Z)
Can phones, syllables, and words emerge as side-products of cross-situational audiovisual learning? -- A computational investigation [2.28438857884398]
We study the so-called latent language hypothesis (LLH) LLH connects linguistic representation learning to general predictive processing within and across sensory modalities. We explore LLH further in extensive learning simulations with different neural network models for audiovisual cross-situational learning.
arXiv Detail & Related papers (2021-09-29T05:49:46Z)
Rediscovering the Slavic Continuum in Representations Emerging from Neural Models of Spoken Language Identification [16.369477141866405]
We present a neural model for Slavic language identification in speech signals. We analyze its emergent representations to investigate whether they reflect objective measures of language relatedness.
arXiv Detail & Related papers (2020-10-22T18:18:19Z)
"Notic My Speech" -- Blending Speech Patterns With Multimedia [65.91370924641862]
We propose a view-temporal attention mechanism to model both the view dependence and the visemic importance in speech recognition and understanding. Our proposed method outperformed the existing work by 4.99% in terms of the viseme error rate. We show that there is a strong correlation between our model's understanding of multi-view speech and the human perception.
arXiv Detail & Related papers (2020-06-12T06:51:55Z)

This list is automatically generated from the titles and abstracts of the papers in this site.