Hearings and mishearings: decrypting the spoken word
- URL: http://arxiv.org/abs/2009.00429v1
- Date: Tue, 1 Sep 2020 13:58:51 GMT
- Title: Hearings and mishearings: decrypting the spoken word
- Authors: Anita Mehta, Jean-Marc Luck
- Abstract summary: We propose a model of the speech perception of individual words in the presence of mishearings.
We show for instance that speech perception is easy when the word length is less than a threshold, to be identified with a static transition.
We extend this to the dynamics of word recognition, proposing an intuitive approach highlighting the distinction between individual, isolated mishearings and clusters of contiguous mishearings.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We propose a model of the speech perception of individual words in the
presence of mishearings. This phenomenological approach is based on concepts
used in linguistics, and provides a formalism that is universal across
languages. We put forward an efficient two-parameter form for the word length
distribution, and introduce a simple representation of mishearings, which we
use in our subsequent modelling of word recognition. In a context-free
scenario, word recognition often occurs via anticipation when, part-way into a
word, we can correctly guess its full form. We give a quantitative estimate of
this anticipation threshold when no mishearings occur, in terms of model
parameters. As might be expected, the whole anticipation effect disappears when
there are sufficiently many mishearings. Our global approach to the problem of
speech perception is in the spirit of an optimisation problem. We show for
instance that speech perception is easy when the word length is less than a
threshold, to be identified with a static transition, and hard otherwise. We
extend this to the dynamics of word recognition, proposing an intuitive
approach highlighting the distinction between individual, isolated mishearings
and clusters of contiguous mishearings. At least in some parameter range, a
dynamical transition is manifest well before the static transition is reached,
as is the case for many other examples of complex systems.
Related papers
- Speech perception: a model of word recognition [0.0]
We present a model of speech perception which takes into account effects of correlations between sounds.
Words in this model correspond to the attractors of a suitably chosen descent dynamics.
We examine the decryption of short and long words in the presence of mishearings.
arXiv Detail & Related papers (2024-10-24T09:41:47Z) - Identifying and interpreting non-aligned human conceptual
representations using language modeling [0.0]
We show that congenital blindness induces conceptual reorganization in both a-modal and sensory-related verbal domains.
We find that blind individuals more strongly associate social and cognitive meanings to verbs related to motion.
For some verbs, representations of blind and sighted are highly similar.
arXiv Detail & Related papers (2024-03-10T13:02:27Z) - Exploring Speech Recognition, Translation, and Understanding with
Discrete Speech Units: A Comparative Study [68.88536866933038]
Speech signals, typically sampled at rates in the tens of thousands per second, contain redundancies.
Recent investigations proposed the use of discrete speech units derived from self-supervised learning representations.
Applying various methods, such as de-duplication and subword modeling, can further compress the speech sequence length.
arXiv Detail & Related papers (2023-09-27T17:21:13Z) - The neural dynamics of auditory word recognition and integration [21.582292050622456]
We present a computational model of word recognition which formalizes this perceptual process in Bayesian decision theory.
We fit this model to explain scalp EEG signals recorded as subjects passively listened to a fictional story.
The model reveals distinct neural processing of words depending on whether or not they can be quickly recognized.
arXiv Detail & Related papers (2023-05-22T18:06:32Z) - A unified one-shot prosody and speaker conversion system with
self-supervised discrete speech units [94.64927912924087]
Existing systems ignore the correlation between prosody and language content, leading to degradation of naturalness in converted speech.
We devise a cascaded modular system leveraging self-supervised discrete speech units as language representation.
Experiments show that our system outperforms previous approaches in naturalness, intelligibility, speaker transferability, and prosody transferability.
arXiv Detail & Related papers (2022-11-12T00:54:09Z) - Lost in Context? On the Sense-wise Variance of Contextualized Word
Embeddings [11.475144702935568]
We quantify how much the contextualized embeddings of each word sense vary across contexts in typical pre-trained models.
We find that word representations are position-biased, where the first words in different contexts tend to be more similar.
arXiv Detail & Related papers (2022-08-20T12:27:25Z) - A Highly Adaptive Acoustic Model for Accurate Multi-Dialect Speech
Recognition [80.87085897419982]
We propose a novel acoustic modeling technique for accurate multi-dialect speech recognition with a single AM.
Our proposed AM is dynamically adapted based on both dialect information and its internal representation, which results in a highly adaptive AM for handling multiple dialects simultaneously.
The experimental results on large scale speech datasets show that the proposed AM outperforms all the previous ones, reducing word error rates (WERs) by 8.11% relative compared to a single all-dialects AM and by 7.31% relative compared to dialect-specific AMs.
arXiv Detail & Related papers (2022-05-06T06:07:09Z) - Modelling word learning and recognition using visually grounded speech [18.136170489933082]
Computational models of speech recognition often assume that the set of target words is already given.
This implies that these models do not learn to recognise speech from scratch without prior knowledge and explicit supervision.
Visually grounded speech models learn to recognise speech without prior knowledge by exploiting statistical dependencies between spoken and visual input.
arXiv Detail & Related papers (2022-03-14T08:59:37Z) - Disambiguatory Signals are Stronger in Word-initial Positions [48.18148856974974]
We point out the confounds in existing methods for comparing the informativeness of segments early in the word versus later in the word.
We find evidence across hundreds of languages that indeed there is a cross-linguistic tendency to front-load information in words.
arXiv Detail & Related papers (2021-02-03T18:19:16Z) - Mechanisms for Handling Nested Dependencies in Neural-Network Language
Models and Humans [75.15855405318855]
We studied whether a modern artificial neural network trained with "deep learning" methods mimics a central aspect of human sentence processing.
Although the network was solely trained to predict the next word in a large corpus, analysis showed the emergence of specialized units that successfully handled local and long-distance syntactic agreement.
We tested the model's predictions in a behavioral experiment where humans detected violations in number agreement in sentences with systematic variations in the singular/plural status of multiple nouns.
arXiv Detail & Related papers (2020-06-19T12:00:05Z) - "Notic My Speech" -- Blending Speech Patterns With Multimedia [65.91370924641862]
We propose a view-temporal attention mechanism to model both the view dependence and the visemic importance in speech recognition and understanding.
Our proposed method outperformed the existing work by 4.99% in terms of the viseme error rate.
We show that there is a strong correlation between our model's understanding of multi-view speech and the human perception.
arXiv Detail & Related papers (2020-06-12T06:51:55Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.