A new kid on the block: Distributional semantics predicts the word-specific tone signatures of monosyllabic words in conversational Taiwan Mandarin
- URL: http://arxiv.org/abs/2511.17337v1
- Date: Fri, 21 Nov 2025 15:56:58 GMT
- Title: A new kid on the block: Distributional semantics predicts the word-specific tone signatures of monosyllabic words in conversational Taiwan Mandarin
- Authors: Xiaoyun Jin, Mirjam Ernestus, R. Harald Baayen,
- Abstract summary: We investigate how the pitch contours of monosyllabic words are realized in spontaneous conversational Mandarin.<n>We find that the effect of word remains a strong predictor of tonal realization.<n>For phonetics, distributional semantics is a new kid on the block.
- Score: 0.4078247440919472
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We present a corpus-based investigation of how the pitch contours of monosyllabic words are realized in spontaneous conversational Mandarin, focusing on the effects of words' meanings. We used the generalized additive model to decompose a given observed pitch contour into a set of component pitch contours that are tied to different control variables and semantic predictors. Even when variables such as word duration, gender, speaker identity, tonal context, vowel height, and utterance position are controlled for, the effect of word remains a strong predictor of tonal realization. We present evidence that this effect of word is a semantic effect: word sense is shown to be a better predictor than word, and heterographic homophones are shown to have different pitch contours. The strongest evidence for the importance of semantics is that the pitch contours of individual word tokens can be predicted from their contextualized embeddings with an accuracy that substantially exceeds a permutation baseline. For phonetics, distributional semantics is a new kid on the block. Although our findings challenge standard theories of Mandarin tone, they fit well within the theoretical framework of the Discriminative Lexicon Model.
Related papers
- Languages in Multilingual Speech Foundation Models Align Both Phonetically and Semantically [58.019484208091534]
Cross-lingual alignment in pretrained language models (LMs) has enabled efficient transfer in text-based LMs.<n>It remains an open question whether findings and methods from text-based cross-lingual alignment apply to speech.
arXiv Detail & Related papers (2025-05-26T07:21:20Z) - The realization of tones in spontaneous spoken Taiwan Mandarin: a corpus-based survey and theory-driven computational modeling [1.7723990552388866]
This study investigates the tonal realization of Mandarin disyllabic words with all 20 possible combinations of two tones.<n>The results show that meaning in context and phonetic realization are far more entangled than standard linguistic theory predicts.
arXiv Detail & Related papers (2025-03-29T17:39:55Z) - Word-specific tonal realizations in Mandarin [0.9249657468385781]
This study shows that tonal realization is also partially determined by words' meanings.<n>We first show, on the basis of a corpus of Taiwan Mandarin spontaneous conversations, that word type is a stronger predictor of tonal realization than all the previously established word-form related predictors combined.<n>We then proceed to show, using computational modeling with context-specific word embeddings, that token-specific pitch contours predict word type with 50% accuracy on held-out data.
arXiv Detail & Related papers (2024-05-11T13:00:35Z) - Identifying and interpreting non-aligned human conceptual
representations using language modeling [0.0]
We show that congenital blindness induces conceptual reorganization in both a-modal and sensory-related verbal domains.
We find that blind individuals more strongly associate social and cognitive meanings to verbs related to motion.
For some verbs, representations of blind and sighted are highly similar.
arXiv Detail & Related papers (2024-03-10T13:02:27Z) - Unsupervised Mapping of Arguments of Deverbal Nouns to Their
Corresponding Verbal Labels [52.940886615390106]
Deverbal nouns are verbs commonly used in written English texts to describe events or actions, as well as their arguments.
The solutions that do exist for handling arguments of nominalized constructions are based on semantic annotation.
We propose to adopt a more syntactic approach, which maps the arguments of deverbal nouns to the corresponding verbal construction.
arXiv Detail & Related papers (2023-06-24T10:07:01Z) - Neighboring Words Affect Human Interpretation of Saliency Explanations [65.29015910991261]
Word-level saliency explanations are often used to communicate feature-attribution in text-based models.
Recent studies found that superficial factors such as word length can distort human interpretation of the communicated saliency scores.
We investigate how the marking of a word's neighboring words affect the explainee's perception of the word's importance in the context of a saliency explanation.
arXiv Detail & Related papers (2023-05-04T09:50:25Z) - Lost in Context? On the Sense-wise Variance of Contextualized Word
Embeddings [11.475144702935568]
We quantify how much the contextualized embeddings of each word sense vary across contexts in typical pre-trained models.
We find that word representations are position-biased, where the first words in different contexts tend to be more similar.
arXiv Detail & Related papers (2022-08-20T12:27:25Z) - Topology of Word Embeddings: Singularities Reflect Polysemy [68.8204255655161]
We introduce a topological measure of polysemy based on persistent homology that correlates well with the actual number of meanings of a word.
We propose a simple, topologically motivated solution to the SemEval-2010 task on Word Sense Induction & Disambiguation.
arXiv Detail & Related papers (2020-11-18T17:21:51Z) - Speakers Fill Lexical Semantic Gaps with Context [65.08205006886591]
We operationalise the lexical ambiguity of a word as the entropy of meanings it can take.
We find significant correlations between our estimate of ambiguity and the number of synonyms a word has in WordNet.
This suggests that, in the presence of ambiguity, speakers compensate by making contexts more informative.
arXiv Detail & Related papers (2020-10-05T17:19:10Z) - Hearings and mishearings: decrypting the spoken word [0.0]
We propose a model of the speech perception of individual words in the presence of mishearings.
We show for instance that speech perception is easy when the word length is less than a threshold, to be identified with a static transition.
We extend this to the dynamics of word recognition, proposing an intuitive approach highlighting the distinction between individual, isolated mishearings and clusters of contiguous mishearings.
arXiv Detail & Related papers (2020-09-01T13:58:51Z) - Where New Words Are Born: Distributional Semantic Analysis of Neologisms
and Their Semantic Neighborhoods [51.34667808471513]
We investigate the importance of two factors, semantic sparsity and frequency growth rates of semantic neighbors, formalized in the distributional semantics paradigm.
We show that both factors are predictive word emergence although we find more support for the latter hypothesis.
arXiv Detail & Related papers (2020-01-21T19:09:49Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.