A Comparative Study of Controllability, Explainability, and Performance in Dysfluency Detection Models
- URL: http://arxiv.org/abs/2509.00058v1
- Date: Mon, 25 Aug 2025 14:23:09 GMT
- Title: A Comparative Study of Controllability, Explainability, and Performance in Dysfluency Detection Models
- Authors: Eric Zhang, Li Wei, Sarah Chen, Michael Wang,
- Abstract summary: We compare four dysfluency modeling approaches: YOLO-Stutter, FluentNet, UDM, and SSDM.<n>YOLO-Stutter and FluentNet provide efficiency and simplicity, but with limited transparency.<n>UDM achieves the best balance of accuracy and clinical interpretability.
- Score: 6.837099592935974
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Recent advances in dysfluency detection have introduced a variety of modeling paradigms, ranging from lightweight object-detection inspired networks (YOLOStutter) to modular interpretable frameworks (UDM). While performance on benchmark datasets continues to improve, clinical adoption requires more than accuracy: models must be controllable and explainable. In this paper, we present a systematic comparative analysis of four representative approaches--YOLO-Stutter, FluentNet, UDM, and SSDM--along three dimensions: performance, controllability, and explainability. Through comprehensive evaluation on multiple datasets and expert clinician assessment, we find that YOLO-Stutter and FluentNet provide efficiency and simplicity, but with limited transparency; UDM achieves the best balance of accuracy and clinical interpretability; and SSDM, while promising, could not be fully reproduced in our experiments. Our analysis highlights the trade-offs among competing approaches and identifies future directions for clinically viable dysfluency modeling. We also provide detailed implementation insights and practical deployment considerations for each approach.
Related papers
- Concept-Enhanced Multimodal RAG: Towards Interpretable and Accurate Radiology Report Generation [12.226029763256962]
Radiology Report Generation through Vision-Language Models (VLMs) promises to reduce documentation burden, improve reporting consistency, and accelerate clinical adoption.<n>Existing research treats interpretability and accuracy as separate objectives, with concept-based explainability techniques focusing primarily on transparency.<n>We present Concept-Enhanced Multimodal RAG (CEMRAG), a unified framework that decomposes visual representations into interpretable clinical concepts.
arXiv Detail & Related papers (2026-02-17T15:18:07Z) - Intervention Efficiency and Perturbation Validation Framework: Capacity-Aware and Robust Clinical Model Selection under the Rashomon Effect [8.16102315566872]
coexistence of multiple models with comparable performance poses fundamental challenges for trustworthy deployment and evaluation.<n>We propose two complementary tools for robust model assessment and selection: Intervention Efficiency (IE) and the Perturbation Validation Framework (PVF)<n>IE is a capacity-aware metric that quantifies how efficiently a model identifies actionable true positives when only limited interventions are feasible.<n>PVF introduces a structured approach to assess the stability of models under data perturbations, identifying models whose performance remains most invariant across noisy or shifted validation sets.
arXiv Detail & Related papers (2025-11-18T10:21:07Z) - From Promise to Practical Reality: Transforming Diffusion MRI Analysis with Fast Deep Learning Enhancement [35.368152968098194]
FastFOD-Net is an end-to-end deep learning framework enhancing FODs with superior performance and delivering training/inference efficiency for clinical use.<n>This work will facilitate the more widespread adoption of, and build clinical trust in, deep learning based methods for diffusion MRI enhancement.
arXiv Detail & Related papers (2025-08-13T17:56:29Z) - Bridging the Generalisation Gap: Synthetic Data Generation for Multi-Site Clinical Model Validation [0.3362278589492841]
Existing model evaluation approaches often rely on real-world datasets, which are limited in availability, embed confounding biases, and lack flexibility needed for systematic experimentation.<n>We propose a novel structured synthetic data framework designed for the controlled robustness of benchmarking model, fairness, and generalisability.
arXiv Detail & Related papers (2025-04-29T11:04:28Z) - Unsupervised Model Diagnosis [49.36194740479798]
This paper proposes Unsupervised Model Diagnosis (UMO) to produce semantic counterfactual explanations without any user guidance.
Our approach identifies and visualizes changes in semantics, and then matches these changes to attributes from wide-ranging text sources.
arXiv Detail & Related papers (2024-10-08T17:59:03Z) - Explanatory Model Monitoring to Understand the Effects of Feature Shifts on Performance [61.06245197347139]
We propose a novel approach to explain the behavior of a black-box model under feature shifts.
We refer to our method that combines concepts from Optimal Transport and Shapley Values as Explanatory Performance Estimation.
arXiv Detail & Related papers (2024-08-24T18:28:19Z) - Rethinking model prototyping through the MedMNIST+ dataset collection [0.11999555634662634]
This work introduces a comprehensive benchmark for the MedMNIST+ dataset collection.<n>We reassess commonly used Convolutional Neural Networks (CNNs) and Vision Transformer (ViT) architectures across distinct medical datasets.<n>Our findings suggest that computationally efficient training schemes and modern foundation models offer viable alternatives to costly end-to-end training.
arXiv Detail & Related papers (2024-04-24T10:19:25Z) - VALOR-EVAL: Holistic Coverage and Faithfulness Evaluation of Large Vision-Language Models [57.43276586087863]
Large Vision-Language Models (LVLMs) suffer from hallucination issues, wherein the models generate plausible-sounding but factually incorrect outputs.
Existing benchmarks are often limited in scope, focusing mainly on object hallucinations.
We introduce a multi-dimensional benchmark covering objects, attributes, and relations, with challenging images selected based on associative biases.
arXiv Detail & Related papers (2024-04-22T04:49:22Z) - TREEMENT: Interpretable Patient-Trial Matching via Personalized Dynamic
Tree-Based Memory Network [54.332862955411656]
Clinical trials are critical for drug development but often suffer from expensive and inefficient patient recruitment.
In recent years, machine learning models have been proposed for speeding up patient recruitment via automatically matching patients with clinical trials.
We introduce a dynamic tree-based memory network model named TREEMENT to provide accurate and interpretable patient trial matching.
arXiv Detail & Related papers (2023-07-19T12:35:09Z) - Quantifying Explainability in NLP and Analyzing Algorithms for
Performance-Explainability Tradeoff [0.0]
We explore the current art of explainability and interpretability within a case study in clinical text classification.
We demonstrate various visualization techniques for fully interpretable methods as well as model-agnostic post hoc attributions.
We introduce a framework through which practitioners and researchers can assess the frontier between a model's predictive performance and the quality of its available explanations.
arXiv Detail & Related papers (2021-07-12T19:07:24Z) - Network Diffusions via Neural Mean-Field Dynamics [52.091487866968286]
We propose a novel learning framework for inference and estimation problems of diffusion on networks.
Our framework is derived from the Mori-Zwanzig formalism to obtain an exact evolution of the node infection probabilities.
Our approach is versatile and robust to variations of the underlying diffusion network models.
arXiv Detail & Related papers (2020-06-16T18:45:20Z) - Estimating the Effects of Continuous-valued Interventions using
Generative Adversarial Networks [103.14809802212535]
We build on the generative adversarial networks (GANs) framework to address the problem of estimating the effect of continuous-valued interventions.
Our model, SCIGAN, is flexible and capable of simultaneously estimating counterfactual outcomes for several different continuous interventions.
To address the challenges presented by shifting to continuous interventions, we propose a novel architecture for our discriminator.
arXiv Detail & Related papers (2020-02-27T18:46:21Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.