Related papers: Word stress in self-supervised speech models: A cross-linguistic comparison

Word stress in self-supervised speech models: A cross-linguistic comparison

URL: http://arxiv.org/abs/2507.04738v1
Date: Mon, 07 Jul 2025 08:10:26 GMT
Title: Word stress in self-supervised speech models: A cross-linguistic comparison
Authors: Martijn Bentum, Louis ten Bosch, Tomas O. Lentz,
Abstract summary: We study word stress representations learned by self-supervised speech models (S3M)<n>We investigate the S3M representations of word stress for five different languages.
Score: 6.552278017383513
License: http://creativecommons.org/licenses/by/4.0/
Abstract: In this paper we study word stress representations learned by self-supervised speech models (S3M), specifically the Wav2vec 2.0 model. We investigate the S3M representations of word stress for five different languages: Three languages with variable or lexical stress (Dutch, English and German) and two languages with fixed or demarcative stress (Hungarian and Polish). We train diagnostic stress classifiers on S3M embeddings and show that they can distinguish between stressed and unstressed syllables in read-aloud short sentences with high accuracy. We also tested language-specificity effects of S3M word stress. The results indicate that the word stress representations are language-specific, with a greater difference between the set of variable versus the set of fixed stressed languages.

Related papers

StressTest: Can YOUR Speech LM Handle the Stress? [20.802090523583196]
Sentence stress refers to emphasis placed on specific words within a spoken utterance to highlight or contrast an idea, or to introduce new information.<n>Recent advances in speech-aware language models (SLMs) have enabled direct processing of audio.<n>Despite the crucial role of sentence stress in shaping meaning and speaker intent, it remains largely overlooked in evaluation and development of such models.
arXiv Detail & Related papers (2025-05-28T18:32:56Z)
WHISTRESS: Enriching Transcriptions with Sentence Stress Detection [20.802090523583196]
Sentence stress is crucial for conveying speaker intent in spoken language.<n>We introduce WHISTRESS, an alignment-free approach for enhancing transcription systems with sentence stress detection.<n>We train WHISTRESS on TINYSTRESS-15K and evaluate it against several competitive baselines.
arXiv Detail & Related papers (2025-05-25T11:45:08Z)
Are Language Models Agnostic to Linguistically Grounded Perturbations? A Case Study of Indic Languages [47.45957604683302]
We study whether pre-trained language models are agnostic to linguistically grounded attacks or not.<n>Our findings reveal that although PLMs are susceptible to linguistic perturbations, when compared to non-linguistic attacks, PLMs exhibit a slightly lower susceptibility to linguistic attacks.
arXiv Detail & Related papers (2024-12-14T12:10:38Z)
Self-Supervised Speech Representations are More Phonetic than Semantic [52.02626675137819]
Self-supervised speech models (S3Ms) have become an effective backbone for speech applications. We seek a more fine-grained analysis of the word-level linguistic properties encoded in S3Ms. Our study reveals that S3M representations consistently and significantly exhibit more phonetic than semantic similarity.
arXiv Detail & Related papers (2024-06-12T20:04:44Z)
Detecting Syllable-Level Pronunciation Stress with A Self-Attention Model [0.0]
Knowing the stress level for each syllable of spoken English is important for English speakers and learners. This paper presents a self-attention model to identify the stress level for each syllable of spoken English.
arXiv Detail & Related papers (2023-11-01T05:05:49Z)
Speaker Embeddings as Individuality Proxy for Voice Stress Detection [14.332772222772668]
Since the mental states of the speaker modulate speech, stress introduced by cognitive or physical loads could be detected in the voice. The existing voice stress detection benchmark has shown that the audio embeddings extracted from the Hybrid BYOL-S self-supervised model perform well. This paper presents the design and development of voice stress detection, trained on more than 100 speakers from 9 language groups and five different types of stress.
arXiv Detail & Related papers (2023-06-09T14:11:07Z)
A Cross-Linguistic Pressure for Uniform Information Density in Word Order [79.54362557462359]
We use computational models to test whether real orders lead to greater information uniformity than counterfactual orders. Among SVO languages, real word orders consistently have greater uniformity than reverse word orders. Only linguistically implausible counterfactual orders consistently exceed the uniformity of real orders.
arXiv Detail & Related papers (2023-06-06T14:52:15Z)
Using Open-Ended Stressor Responses to Predict Depressive Symptoms across Demographics [22.476706522778994]
We investigate the relationship between open-ended text responses about stressors and depressive symptoms across gender and racial/ethnic groups. We use topic models and other NLP tools to find thematic and vocabulary differences when reporting stressors across demographic groups. We train language models using self-reported stressors to predict depressive symptoms, finding a relationship between stressors and depression.
arXiv Detail & Related papers (2022-11-15T06:34:58Z)
M-SpeechCLIP: Leveraging Large-Scale, Pre-Trained Models for Multilingual Speech to Image Retrieval [56.49878599920353]
This work investigates the use of large-scale, English-only pre-trained models (CLIP and HuBERT) for multilingual image-speech retrieval. For non-English image-speech retrieval, we outperform the current state-of-the-art performance by a wide margin both when training separate models for each language, and with a single model which processes speech in all three languages.
arXiv Detail & Related papers (2022-11-02T14:54:45Z)
Multitasking Models are Robust to Structural Failure: A Neural Model for Bilingual Cognitive Reserve [78.3500985535601]
We find a surprising connection between multitask learning and robustness to neuron failures. Our experiments show that bilingual language models retain higher performance under various neuron perturbations. We provide a theoretical justification for this robustness by mathematically analyzing linear representation learning.
arXiv Detail & Related papers (2022-10-20T22:23:27Z)
Pragmatic information in translation: a corpus-based study of tense and mood in English and German [70.3497683558609]
Grammatical tense and mood are important linguistic phenomena to consider in natural language processing (NLP) research. We consider the correspondence between English and German tense and mood in translation. Of particular importance is the challenge of modeling tense and mood in rule-based, phrase-based statistical and neural machine translation.
arXiv Detail & Related papers (2020-07-10T08:15:59Z)

This list is automatically generated from the titles and abstracts of the papers in this site.