Related papers: Analysis and Evaluation of Synthetic Data Generation in Speech Dysfluency Detection

Analysis and Evaluation of Synthetic Data Generation in Speech Dysfluency Detection

URL: http://arxiv.org/abs/2505.22029v1
Date: Wed, 28 May 2025 06:52:10 GMT
Title: Analysis and Evaluation of Synthetic Data Generation in Speech Dysfluency Detection
Authors: Jinming Zhang, Xuanru Zhou, Jiachen Lian, Shuhe Li, William Li, Zoe Ezzes, Rian Bogley, Lisa Wauters, Zachary Miller, Jet Vonk, Brittany Morin, Maria Gorno-Tempini, Gopala Anumanchipalli,
Abstract summary: Speech dysfluency detection is crucial for clinical diagnosis and language assessment.<n>This dataset captures 11 dysfluency categories spanning both word and phoneme levels.<n>Building upon this resource, we improve an end-to-end dysfluency detection framework.
Score: 5.95376852691752
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Speech dysfluency detection is crucial for clinical diagnosis and language assessment, but existing methods are limited by the scarcity of high-quality annotated data. Although recent advances in TTS model have enabled synthetic dysfluency generation, existing synthetic datasets suffer from unnatural prosody and limited contextual diversity. To address these limitations, we propose LLM-Dys -- the most comprehensive dysfluent speech corpus with LLM-enhanced dysfluency simulation. This dataset captures 11 dysfluency categories spanning both word and phoneme levels. Building upon this resource, we improve an end-to-end dysfluency detection framework. Experimental validation demonstrates state-of-the-art performance. All data, models, and code are open-sourced at https://github.com/Berkeley-Speech-Group/LLM-Dys.

Related papers

Seamless Dysfluent Speech Text Alignment for Disordered Speech Analysis [8.5693791544413]
We propose Neural LCS, a novel approach for dysfluent text-text and speech-text alignment.<n>We evaluate our method on a large-scale simulated dataset.<n>Our results demonstrate the potential of Neural LCS to enhance automated systems for diagnosing and analyzing speech disorders.
arXiv Detail & Related papers (2025-06-05T03:06:37Z)
Dysfluent WFST: A Framework for Zero-Shot Speech Dysfluency Transcription and Detection [5.512072120303165]
Dysfluent-WFST is a zero-shot decoder that simultaneously transcribes phonemes and detects dysfluency.<n>It achieves state-of-the-art performance in both phonetic error rate and dysfluency detection on simulated and real speech data.
arXiv Detail & Related papers (2025-05-22T08:02:50Z)
Few-shot LLM Synthetic Data with Distribution Matching [37.55363714371521]
Large language models (LLMs) produce high-quality synthetic data to enhance the performance of smaller models.<n>LLMs-generated synthetic data often differs from the real data in key language attributes.<n>We introduce SynAlign: a synthetic data generation and filtering framework based on key attribute distribution matching.
arXiv Detail & Related papers (2025-02-09T16:43:32Z)
How Hard is this Test Set? NLI Characterization by Exploiting Training Dynamics [49.9329723199239]
We propose a method for the automated creation of a challenging test set without relying on the manual construction of artificial and unrealistic examples. We categorize the test set of popular NLI datasets into three difficulty levels by leveraging methods that exploit training dynamics. When our characterization method is applied to the training set, models trained with only a fraction of the data achieve comparable performance to those trained on the full dataset.
arXiv Detail & Related papers (2024-10-04T13:39:21Z)
Enhancing Unsupervised Sentence Embeddings via Knowledge-Driven Data Augmentation and Gaussian-Decayed Contrastive Learning [37.54523122932728]
We propose a pipeline-based data augmentation method via large language models (LLMs)<n>We introduce the Gaussian-decayed gradient-assisted Contrastive Sentence Embedding (GCSE) model to enhance unsupervised sentence embeddings.<n> Experimental results show that our approach achieves state-of-the-art performance in semantic textual similarity tasks.
arXiv Detail & Related papers (2024-09-19T16:29:58Z)
Unlocking Potential Binders: Multimodal Pretraining DEL-Fusion for Denoising DNA-Encoded Libraries [51.72836644350993]
Multimodal Pretraining DEL-Fusion model (MPDF) We develop pretraining tasks applying contrastive objectives between different compound representations and their text descriptions. We propose a novel DEL-fusion framework that amalgamates compound information at the atomic, submolecular, and molecular levels.
arXiv Detail & Related papers (2024-09-07T17:32:21Z)
Unveiling the Flaws: Exploring Imperfections in Synthetic Data and Mitigation Strategies for Large Language Models [89.88010750772413]
Synthetic data has been proposed as a solution to address the issue of high-quality data scarcity in the training of large language models (LLMs) Our work delves into these specific flaws associated with question-answer (Q-A) pairs, a prevalent type of synthetic data, and presents a method based on unlearning techniques to mitigate these flaws. Our work has yielded key insights into the effective use of synthetic data, aiming to promote more robust and efficient LLM training.
arXiv Detail & Related papers (2024-06-18T08:38:59Z)
AMRFact: Enhancing Summarization Factuality Evaluation with AMR-Driven Negative Samples Generation [57.8363998797433]
We propose AMRFact, a framework that generates perturbed summaries using Abstract Meaning Representations (AMRs) Our approach parses factually consistent summaries into AMR graphs and injects controlled factual inconsistencies to create negative examples, allowing for coherent factually inconsistent summaries to be generated with high error-type coverage.
arXiv Detail & Related papers (2023-11-16T02:56:29Z)
HyPoradise: An Open Baseline for Generative Speech Recognition with Large Language Models [81.56455625624041]
We introduce the first open-source benchmark to utilize external large language models (LLMs) for ASR error correction. The proposed benchmark contains a novel dataset, HyPoradise (HP), encompassing more than 334,000 pairs of N-best hypotheses. LLMs with reasonable prompt and its generative capability can even correct those tokens that are missing in N-best list.
arXiv Detail & Related papers (2023-09-27T14:44:10Z)
Artificial Disfluency Detection, Uh No, Disfluency Generation for the Masses [0.0]
This work proposes LARD, a method for automatically generating artificial disfluencies from fluent text. LARD can simulate all the different types of disfluencies (repetitions, replacements and restarts) based on the reparandum/interregnum annotation scheme. Since the proposed method requires only fluent text, it can be used directly for training, bypassing the requirement of annotated disfluent data.
arXiv Detail & Related papers (2022-11-16T22:00:02Z)
LARD: Large-scale Artificial Disfluency Generation [0.0]
We propose LARD, a method for generating complex and realistic artificial disfluencies with little effort. The proposed method can handle three of the most common types of disfluencies: repetitions, replacements and restarts. We release a new large-scale dataset with disfluencies that can be used on four different tasks.
arXiv Detail & Related papers (2022-01-13T16:02:36Z)
Bridging the Gap Between Clean Data Training and Real-World Inference for Spoken Language Understanding [76.89426311082927]
Existing models are trained on clean data, which causes a textitgap between clean data training and real-world inference. We propose a method from the perspective of domain adaptation, by which both high- and low-quality samples are embedding into similar vector space. Experiments on the widely-used dataset, Snips, and large scale in-house dataset (10 million training examples) demonstrate that this method not only outperforms the baseline models on real-world (noisy) corpus but also enhances the robustness, that is, it produces high-quality results under a noisy environment.
arXiv Detail & Related papers (2021-04-13T17:54:33Z)
Adversarial Feature Hallucination Networks for Few-Shot Learning [84.31660118264514]
Adversarial Feature Hallucination Networks (AFHN) is based on conditional Wasserstein Generative Adversarial networks (cWGAN) Two novel regularizers are incorporated into AFHN to encourage discriminability and diversity of the synthesized features.
arXiv Detail & Related papers (2020-03-30T02:43:16Z)

This list is automatically generated from the titles and abstracts of the papers in this site.