A phonetic model of non-native spoken word processing
- URL: http://arxiv.org/abs/2101.11332v1
- Date: Wed, 27 Jan 2021 11:46:21 GMT
- Title: A phonetic model of non-native spoken word processing
- Authors: Yevgen Matusevych, Herman Kamper, Thomas Schatz, Naomi H. Feldman,
Sharon Goldwater
- Abstract summary: We train a computational model of phonetic learning, which has no access to phonology, on either one or two languages.
We first show that the model exhibits predictable behaviors on phone-level and word-level discrimination tasks.
We then test the model on a spoken word processing task, showing that phonology may not be necessary to explain some of the word processing effects observed in non-native speakers.
- Score: 40.018538874161756
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Non-native speakers show difficulties with spoken word processing. Many
studies attribute these difficulties to imprecise phonological encoding of
words in the lexical memory. We test an alternative hypothesis: that some of
these difficulties can arise from the non-native speakers' phonetic perception.
We train a computational model of phonetic learning, which has no access to
phonology, on either one or two languages. We first show that the model
exhibits predictable behaviors on phone-level and word-level discrimination
tasks. We then test the model on a spoken word processing task, showing that
phonology may not be necessary to explain some of the word processing effects
observed in non-native speakers. We run an additional analysis of the model's
lexical representation space, showing that the two training languages are not
fully separated in that space, similarly to the languages of a bilingual human
speaker.
Related papers
- Encoding of lexical tone in self-supervised models of spoken language [3.7270979204213446]
This paper aims to analyze the tone encoding capabilities of Spoken Language Models (SLMs)
We show that SLMs encode lexical tone to a significant degree even when they are trained on data from non-tonal languages.
We find that SLMs behave similarly to native and non-native human participants in tone and consonant perception studies.
arXiv Detail & Related papers (2024-03-25T15:28:38Z) - Improve Bilingual TTS Using Dynamic Language and Phonology Embedding [10.244215079409797]
This paper builds a Mandarin-English TTS system to acquire more standard spoken English speech from a monolingual Chinese speaker.
We specially design an embedding strength modulator to capture the dynamic strength of language and phonology.
arXiv Detail & Related papers (2022-12-07T03:46:18Z) - Testing the Ability of Language Models to Interpret Figurative Language [69.59943454934799]
Figurative and metaphorical language are commonplace in discourse.
It remains an open question to what extent modern language models can interpret nonliteral phrases.
We introduce Fig-QA, a Winograd-style nonliteral language understanding task.
arXiv Detail & Related papers (2022-04-26T23:42:22Z) - Cross-lingual Low Resource Speaker Adaptation Using Phonological
Features [2.8080708404213373]
We train a language-agnostic multispeaker model conditioned on a set of phonologically derived features common across different languages.
With as few as 32 and 8 utterances of target speaker data, we obtain high speaker similarity scores and naturalness comparable to the corresponding literature.
arXiv Detail & Related papers (2021-11-17T12:33:42Z) - Towards Language Modelling in the Speech Domain Using Sub-word
Linguistic Units [56.52704348773307]
We propose a novel LSTM-based generative speech LM based on linguistic units including syllables and phonemes.
With a limited dataset, orders of magnitude smaller than that required by contemporary generative models, our model closely approximates babbling speech.
We show the effect of training with auxiliary text LMs, multitask learning objectives, and auxiliary articulatory features.
arXiv Detail & Related papers (2021-10-31T22:48:30Z) - Perception Point: Identifying Critical Learning Periods in Speech for
Bilingual Networks [58.24134321728942]
We compare and identify cognitive aspects on deep neural-based visual lip-reading models.
We observe a strong correlation between these theories in cognitive psychology and our unique modeling.
arXiv Detail & Related papers (2021-10-13T05:30:50Z) - Can phones, syllables, and words emerge as side-products of
cross-situational audiovisual learning? -- A computational investigation [2.28438857884398]
We study the so-called latent language hypothesis (LLH)
LLH connects linguistic representation learning to general predictive processing within and across sensory modalities.
We explore LLH further in extensive learning simulations with different neural network models for audiovisual cross-situational learning.
arXiv Detail & Related papers (2021-09-29T05:49:46Z) - Evaluating Models of Robust Word Recognition with Serial Reproduction [8.17947290421835]
We compare several broad-coverage probabilistic generative language models in their ability to capture human linguistic expectations.
We find that those models that make use of abstract representations of preceding linguistic context best predict the changes made by people in the course of serial reproduction.
arXiv Detail & Related papers (2021-01-24T20:16:12Z) - SPLAT: Speech-Language Joint Pre-Training for Spoken Language
Understanding [61.02342238771685]
Spoken language understanding requires a model to analyze input acoustic signal to understand its linguistic content and make predictions.
Various pre-training methods have been proposed to learn rich representations from large-scale unannotated speech and text.
We propose a novel semi-supervised learning framework, SPLAT, to jointly pre-train the speech and language modules.
arXiv Detail & Related papers (2020-10-05T19:29:49Z) - Towards Zero-shot Learning for Automatic Phonemic Transcription [82.9910512414173]
A more challenging problem is to build phonemic transcribers for languages with zero training data.
Our model is able to recognize unseen phonemes in the target language without any training data.
It achieves 7.7% better phoneme error rate on average over a standard multilingual model.
arXiv Detail & Related papers (2020-02-26T20:38:42Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.