WHISTRESS: Enriching Transcriptions with Sentence Stress Detection
- URL: http://arxiv.org/abs/2505.19103v1
- Date: Sun, 25 May 2025 11:45:08 GMT
- Title: WHISTRESS: Enriching Transcriptions with Sentence Stress Detection
- Authors: Iddo Yosha, Dorin Shteyman, Yossi Adi,
- Abstract summary: Sentence stress is crucial for conveying speaker intent in spoken language.<n>We introduce WHISTRESS, an alignment-free approach for enhancing transcription systems with sentence stress detection.<n>We train WHISTRESS on TINYSTRESS-15K and evaluate it against several competitive baselines.
- Score: 20.802090523583196
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Spoken language conveys meaning not only through words but also through intonation, emotion, and emphasis. Sentence stress, the emphasis placed on specific words within a sentence, is crucial for conveying speaker intent and has been extensively studied in linguistics. In this work, we introduce WHISTRESS, an alignment-free approach for enhancing transcription systems with sentence stress detection. To support this task, we propose TINYSTRESS-15K, a scalable, synthetic training data for the task of sentence stress detection which resulted from a fully automated dataset creation process. We train WHISTRESS on TINYSTRESS-15K and evaluate it against several competitive baselines. Our results show that WHISTRESS outperforms existing methods while requiring no additional input priors during training or inference. Notably, despite being trained on synthetic data, WHISTRESS demonstrates strong zero-shot generalization across diverse benchmarks. Project page: https://pages.cs.huji.ac.il/adiyoss-lab/whistress.
Related papers
- StressTest: Can YOUR Speech LM Handle the Stress? [20.802090523583196]
Sentence stress refers to emphasis placed on specific words within a spoken utterance to highlight or contrast an idea, or to introduce new information.<n>Recent advances in speech-aware language models (SLMs) have enabled direct processing of audio.<n>Despite the crucial role of sentence stress in shaping meaning and speaker intent, it remains largely overlooked in evaluation and development of such models.
arXiv Detail & Related papers (2025-05-28T18:32:56Z) - Stress Detection on Code-Mixed Texts in Dravidian Languages using Machine Learning [0.0]
Stress is a common feeling in daily life, but it can affect mental well-being in some situations.
This study introduces a methodical approach to the stress identification in code-mixed texts for Dravidian languages.
arXiv Detail & Related papers (2024-10-08T23:49:31Z) - Towards Event Extraction from Speech with Contextual Clues [61.164413398231254]
We introduce the Speech Event Extraction (SpeechEE) task and construct three synthetic training sets and one human-spoken test set.
Compared to event extraction from text, SpeechEE poses greater challenges mainly due to complex speech signals that are continuous and have no word boundaries.
Our method brings significant improvements on all datasets, achieving a maximum F1 gain of 10.7%.
arXiv Detail & Related papers (2024-01-27T11:07:19Z) - DenoSent: A Denoising Objective for Self-Supervised Sentence
Representation Learning [59.4644086610381]
We propose a novel denoising objective that inherits from another perspective, i.e., the intra-sentence perspective.
By introducing both discrete and continuous noise, we generate noisy sentences and then train our model to restore them to their original form.
Our empirical evaluations demonstrate that this approach delivers competitive results on both semantic textual similarity (STS) and a wide range of transfer tasks.
arXiv Detail & Related papers (2024-01-24T17:48:45Z) - Adversarial Training For Low-Resource Disfluency Correction [50.51901599433536]
We propose an adversarially-trained sequence-tagging model for Disfluency Correction (DC)
We show the benefit of our proposed technique, which crucially depends on synthetically generated disfluent data, by evaluating it for DC in three Indian languages.
Our technique also performs well in removing stuttering disfluencies in ASR transcripts introduced by speech impairments.
arXiv Detail & Related papers (2023-06-10T08:58:53Z) - Sentiment-Aware Word and Sentence Level Pre-training for Sentiment
Analysis [64.70116276295609]
SentiWSP is a Sentiment-aware pre-trained language model with combined Word-level and Sentence-level Pre-training tasks.
SentiWSP achieves new state-of-the-art performance on various sentence-level and aspect-level sentiment classification benchmarks.
arXiv Detail & Related papers (2022-10-18T12:25:29Z) - Sentence Representation Learning with Generative Objective rather than
Contrastive Objective [86.01683892956144]
We propose a novel generative self-supervised learning objective based on phrase reconstruction.
Our generative learning achieves powerful enough performance improvement and outperforms the current state-of-the-art contrastive methods.
arXiv Detail & Related papers (2022-10-16T07:47:46Z) - Frequency-Aware Contrastive Learning for Neural Machine Translation [24.336356651877388]
Low-frequency word prediction remains a challenge in modern neural machine translation (NMT) systems.
Inspired by the observation that low-frequency words form a more compact embedding space, we tackle this challenge from a representation learning perspective.
We propose a frequency-aware token-level contrastive learning method, in which the hidden state of each decoding step is pushed away from the counterparts of other target words.
arXiv Detail & Related papers (2021-12-29T10:10:10Z) - A study on the efficacy of model pre-training in developing neural
text-to-speech system [55.947807261757056]
This study aims to understand better why and how model pre-training can positively contribute to TTS system performance.
It is found that the TTS system could achieve comparable performance when the pre-training data is reduced to 1/8 of its original size.
arXiv Detail & Related papers (2021-10-08T02:09:28Z) - Measuring Memorization Effect in Word-Level Neural Networks Probing [0.9156064716689833]
We propose a simple general method for measuring the memorization effect, based on a symmetric selection of test words seen versus unseen in training.
Our method can be used to explicitly quantify the amount of memorization happening in a probing setup, so that an adequate setup can be chosen and the results of the probing can be interpreted with a reliability estimate.
arXiv Detail & Related papers (2020-06-29T14:35:42Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.