Lexical Access Model for Italian -- Modeling human speech processing:
identification of words in running speech toward lexical access based on the
detection of landmarks and other acoustic cues to features
- URL: http://arxiv.org/abs/2107.02720v1
- Date: Thu, 24 Jun 2021 10:54:56 GMT
- Title: Lexical Access Model for Italian -- Modeling human speech processing:
identification of words in running speech toward lexical access based on the
detection of landmarks and other acoustic cues to features
- Authors: Maria-Gabriella Di Benedetto, Stefanie Shattuck-Hufnagel, Jeung-Yoon
Choi, Luca De Nardis, Javier Arango, Ian Chan, Alec DeCaprio
- Abstract summary: This work aims at developing a system that imitates humans when identifying words in running speech.
We build a speech recognizer for Italian based on the principles of Stevens' model of Lexical Access.
- Score: 2.033475676482581
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Modelling the process that a listener actuates in deriving the words intended
by a speaker requires setting a hypothesis on how lexical items are stored in
memory. This work aims at developing a system that imitates humans when
identifying words in running speech and, in this way, provide a framework to
better understand human speech processing. We build a speech recognizer for
Italian based on the principles of Stevens' model of Lexical Access in which
words are stored as hierarchical arrangements of distinctive features (Stevens,
K. N. (2002). "Toward a model for lexical access based on acoustic landmarks
and distinctive features," J. Acoust. Soc. Am., 111(4):1872-1891). Over the
past few decades, the Speech Communication Group at the Massachusetts Institute
of Technology (MIT) developed a speech recognition system for English based on
this approach. Italian will be the first language beyond English to be
explored; the extension to another language provides the opportunity to test
the hypothesis that words are represented in memory as a set of
hierarchically-arranged distinctive features, and reveal which of the
underlying mechanisms may have a language-independent nature. This paper also
introduces a new Lexical Access corpus, the LaMIT database, created and labeled
specifically for this work, that will be provided freely to the speech research
community. Future developments will test the hypothesis that specific acoustic
discontinuities - called landmarks - that serve as cues to features, are
language independent, while other cues may be language-dependent, with powerful
implications for understanding how the human brain recognizes speech.
Related papers
- Label Aware Speech Representation Learning For Language Identification [49.197215416945596]
We propose a novel framework of combining self-supervised representation learning with the language label information for the pre-training task.
This framework, termed as Label Aware Speech Representation (LASR) learning, uses a triplet based objective function to incorporate language labels along with the self-supervised loss function.
arXiv Detail & Related papers (2023-06-07T12:14:16Z) - BabySLM: language-acquisition-friendly benchmark of self-supervised
spoken language models [56.93604813379634]
Self-supervised techniques for learning speech representations have been shown to develop linguistic competence from exposure to speech without the need for human labels.
We propose a language-acquisition-friendly benchmark to probe spoken language models at the lexical and syntactic levels.
We highlight two exciting challenges that need to be addressed for further progress: bridging the gap between text and speech and between clean speech and in-the-wild speech.
arXiv Detail & Related papers (2023-06-02T12:54:38Z) - Self-Supervised Speech Representation Learning: A Review [105.1545308184483]
Self-supervised representation learning methods promise a single universal model that would benefit a wide variety of tasks and domains.
Speech representation learning is experiencing similar progress in three main categories: generative, contrastive, and predictive methods.
This review presents approaches for self-supervised speech representation learning and their connection to other research areas.
arXiv Detail & Related papers (2022-05-21T16:52:57Z) - Mandarin-English Code-switching Speech Recognition with Self-supervised
Speech Representation Models [55.82292352607321]
Code-switching (CS) is common in daily conversations where more than one language is used within a sentence.
This paper uses the recently successful self-supervised learning (SSL) methods to leverage many unlabeled speech data without CS.
arXiv Detail & Related papers (2021-10-07T14:43:35Z) - Can phones, syllables, and words emerge as side-products of
cross-situational audiovisual learning? -- A computational investigation [2.28438857884398]
We study the so-called latent language hypothesis (LLH)
LLH connects linguistic representation learning to general predictive processing within and across sensory modalities.
We explore LLH further in extensive learning simulations with different neural network models for audiovisual cross-situational learning.
arXiv Detail & Related papers (2021-09-29T05:49:46Z) - Wav-BERT: Cooperative Acoustic and Linguistic Representation Learning
for Low-Resource Speech Recognition [159.9312272042253]
Wav-BERT is a cooperative acoustic and linguistic representation learning method.
We unify a pre-trained acoustic model (wav2vec 2.0) and a language model (BERT) into an end-to-end trainable framework.
arXiv Detail & Related papers (2021-09-19T16:39:22Z) - Leveraging Acoustic and Linguistic Embeddings from Pretrained speech and
language Models for Intent Classification [81.80311855996584]
We propose a novel intent classification framework that employs acoustic features extracted from a pretrained speech recognition system and linguistic features learned from a pretrained language model.
We achieve 90.86% and 99.07% accuracy on ATIS and Fluent speech corpus, respectively.
arXiv Detail & Related papers (2021-02-15T07:20:06Z) - STEPs-RL: Speech-Text Entanglement for Phonetically Sound Representation
Learning [2.28438857884398]
We present a novel multi-modal deep neural network architecture that uses speech and text entanglement for learning spoken-word representations.
STEPs-RL is trained in a supervised manner to predict the phonetic sequence of a target spoken-word.
Latent representations produced by our model were able to predict the target phonetic sequences with an accuracy of 89.47%.
arXiv Detail & Related papers (2020-11-23T13:29:16Z) - Reinforcement learning of minimalist grammars [0.5862282909017474]
State-of-the-art language technology scans the acoustically analyzed speech signal for relevant keywords.
Words are then inserted into semantic slots to interpret the user's intent.
A mental lexicon must be acquired by a cognitive agent during interaction with its users.
arXiv Detail & Related papers (2020-04-30T14:25:58Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.