Related papers: StutterCut: Uncertainty-Guided Normalised Cut for Dysfluency Segmentation

StutterCut: Uncertainty-Guided Normalised Cut for Dysfluency Segmentation

URL: http://arxiv.org/abs/2508.02255v1
Date: Mon, 04 Aug 2025 10:02:06 GMT
Title: StutterCut: Uncertainty-Guided Normalised Cut for Dysfluency Segmentation
Authors: Suhita Ghosh, Melanie Jouaiti, Jan-Ole Perschewski, Sebastian Stober,
Abstract summary: We introduce StutterCut, a semi-supervised framework that formulates dysfluency segmentation as a graph problem partitioning problem.<n>We refine the connections between nodes using a pseudo-oracle classifier trained on weak (utterance-level) labels.<n>We extend the weakly labelled FluencyBank dataset by incorporating frame-level dysfluency boundaries for four dysfluency types.
Score: 0.0874967598360817
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Detecting and segmenting dysfluencies is crucial for effective speech therapy and real-time feedback. However, most methods only classify dysfluencies at the utterance level. We introduce StutterCut, a semi-supervised framework that formulates dysfluency segmentation as a graph partitioning problem, where speech embeddings from overlapping windows are represented as graph nodes. We refine the connections between nodes using a pseudo-oracle classifier trained on weak (utterance-level) labels, with its influence controlled by an uncertainty measure from Monte Carlo dropout. Additionally, we extend the weakly labelled FluencyBank dataset by incorporating frame-level dysfluency boundaries for four dysfluency types. This provides a more realistic benchmark compared to synthetic datasets. Experiments on real and synthetic datasets show that StutterCut outperforms existing methods, achieving higher F1 scores and more precise stuttering onset detection.

Related papers

Analysis and Evaluation of Synthetic Data Generation in Speech Dysfluency Detection [5.95376852691752]
Speech dysfluency detection is crucial for clinical diagnosis and language assessment.<n>This dataset captures 11 dysfluency categories spanning both word and phoneme levels.<n>Building upon this resource, we improve an end-to-end dysfluency detection framework.
arXiv Detail & Related papers (2025-05-28T06:52:10Z)
Confidence HNC: A Network Flow Technique for Binary Classification with Noisy Labels [0.0]
We consider a classification method that balances two objectives: large similarity within the samples in the cluster, and large dissimilarity between the cluster and its complement.<n>The method, referred to as HNC or SNC, requires seed nodes, or labeled samples, at least one of which is in the cluster and at least one in the complement.<n>The contribution here is the new method in the presence of noisy labels, based on HNC, called Confidence HNC.
arXiv Detail & Related papers (2025-03-04T07:21:40Z)
Extracting Clean and Balanced Subset for Noisy Long-tailed Classification [66.47809135771698]
We develop a novel pseudo labeling method using class prototypes from the perspective of distribution matching. By setting a manually-specific probability measure, we can reduce the side-effects of noisy and long-tailed data simultaneously. Our method can extract this class-balanced subset with clean labels, which brings effective performance gains for long-tailed classification with label noise.
arXiv Detail & Related papers (2024-04-10T07:34:37Z)
All Points Matter: Entropy-Regularized Distribution Alignment for Weakly-supervised 3D Segmentation [67.30502812804271]
Pseudo-labels are widely employed in weakly supervised 3D segmentation tasks where only sparse ground-truth labels are available for learning. We propose a novel learning strategy to regularize the generated pseudo-labels and effectively narrow the gaps between pseudo-labels and model predictions.
arXiv Detail & Related papers (2023-05-25T08:19:31Z)
Constrained self-supervised method with temporal ensembling for fiber bundle detection on anatomic tracing data [0.08329098197319453]
In this work, we propose a deep learning method with a self-supervised loss function for accurate segmentation of fiber bundles on the tracer sections from macaque brains. Evaluation of our method on unseen sections from a different macaque yields promising results with a true positive rate of 0.90.
arXiv Detail & Related papers (2022-08-06T19:17:02Z)
Optimizing Diffusion Rate and Label Reliability in a Graph-Based Semi-supervised Classifier [2.4366811507669124]
The Local and Global Consistency (LGC) algorithm is one of the most well-known graph-based semi-supervised (GSSL) classifiers. We discuss how removing the self-influence of a labeled instance may be beneficial, and how it relates to leave-one-out error. Within this framework, we propose methods to estimate label reliability and diffusion rate.
arXiv Detail & Related papers (2022-01-10T16:58:52Z)
S3: Supervised Self-supervised Learning under Label Noise [53.02249460567745]
In this paper we address the problem of classification in the presence of label noise. In the heart of our method is a sample selection mechanism that relies on the consistency between the annotated label of a sample and the distribution of the labels in its neighborhood in the feature space. Our method significantly surpasses previous methods on both CIFARCIFAR100 with artificial noise and real-world noisy datasets such as WebVision and ANIMAL-10N.
arXiv Detail & Related papers (2021-11-22T15:49:20Z)
Learning Noise Transition Matrix from Only Noisy Labels via Total Variation Regularization [88.91872713134342]
We propose a theoretically grounded method that can estimate the noise transition matrix and learn a classifier simultaneously. We show the effectiveness of the proposed method through experiments on benchmark and real-world datasets.
arXiv Detail & Related papers (2021-02-04T05:09:18Z)
Parzen Window Approximation on Riemannian Manifold [5.600982367387833]
In graph motivated learning, label propagation largely depends on data affinity represented as edges between connected data points. An affinity metric which takes into consideration the irregular sampling effect to yield accurate label propagation is proposed.
arXiv Detail & Related papers (2020-12-29T08:52:31Z)
Generative Partial Visual-Tactile Fused Object Clustering [81.17645983141773]
We propose a Generative Partial Visual-Tactile Fused (i.e., GPVTF) framework for object clustering. A conditional cross-modal clustering generative adversarial network is then developed to synthesize one modality conditioning on the other modality. To the end, two pseudo-label based KL-divergence losses are employed to update the corresponding modality-specific encoders.
arXiv Detail & Related papers (2020-12-28T02:37:03Z)
Integrating end-to-end neural and clustering-based diarization: Getting the best of both worlds [71.36164750147827]
Clustering-based approaches assign speaker labels to speech regions by clustering speaker embeddings such as x-vectors. End-to-end neural diarization (EEND) directly predicts diarization labels using a neural network. We propose a simple but effective hybrid diarization framework that works with overlapped speech and for long recordings containing an arbitrary number of speakers.
arXiv Detail & Related papers (2020-10-26T06:33:02Z)

This list is automatically generated from the titles and abstracts of the papers in this site.