Voice Pathology Detection Using Phonation
- URL: http://arxiv.org/abs/2508.07587v1
- Date: Mon, 11 Aug 2025 03:33:18 GMT
- Title: Voice Pathology Detection Using Phonation
- Authors: Sri Raksha Siva, Nived Suthahar, Prakash Boominathan, Uma Ranjan,
- Abstract summary: This research proposes a machine learning-based framework for detecting voice pathologies.<n> Phonation data from the Saarbr"ucken Voice Database are analyzed.<n>Recurrent Neural Networks (RNNs) classify samples into normal and pathological categories.
- Score: 0.0
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Voice disorders significantly affect communication and quality of life, requiring an early and accurate diagnosis. Traditional methods like laryngoscopy are invasive, subjective, and often inaccessible. This research proposes a noninvasive, machine learning-based framework for detecting voice pathologies using phonation data. Phonation data from the Saarbr\"ucken Voice Database are analyzed using acoustic features such as Mel Frequency Cepstral Coefficients (MFCCs), chroma features, and Mel spectrograms. Recurrent Neural Networks (RNNs), including LSTM and attention mechanisms, classify samples into normal and pathological categories. Data augmentation techniques, including pitch shifting and Gaussian noise addition, enhance model generalizability, while preprocessing ensures signal quality. Scale-based features, such as H\"older and Hurst exponents, further capture signal irregularities and long-term dependencies. The proposed framework offers a noninvasive, automated diagnostic tool for early detection of voice pathologies, supporting AI-driven healthcare, and improving patient outcomes.
Related papers
- Resp-Agent: An Agent-Based System for Multimodal Respiratory Sound Generation and Disease Diagnosis [14.922065513695294]
Resp-Agent is an autonomous multimodal system orchestrated by a novel Active Adrial Curriculum Agent (Thinker-A$2$CA)<n>To address the representation gap, we introduce a Modality-Weaving Diagnoser that weaves EHR data with audio tokens via Strategic Global Attention.<n>To address the data gap, we design a Flow Matching Generator that adapts a text-only Large Language Model (LLM) via modality injection.
arXiv Detail & Related papers (2026-02-16T14:48:24Z) - Explainable Multi-Modal Deep Learning for Automatic Detection of Lung Diseases from Respiratory Audio Signals [0.49581497240446293]
This study presents an explainable multimodal deep learning framework for automatic lung-disease detection using respiratory audio signals.<n>The framework incorporates Grad-CAM, Integrated Gradients, and SHAP, generating interpretable spectral, temporal, and feature-level explanations.<n>The findings demonstrate the framework's potential for telemedicine, point-of-care diagnostics, and real-world respiratory screening.
arXiv Detail & Related papers (2025-11-29T17:15:58Z) - WaveNet's Precision in EEG Classification [1.0885910878567457]
This study introduces a WaveNet-based deep learning model designed to automate the classification of EEG signals into physiological, pathological, artifact, and noise categories.<n>The model was trained, validated, and tested on 209,232 samples with a 70/20/10 percent split.<n>WaveNet's architecture, originally developed for raw audio synthesis, is well suited for EEG data due to its use of dilated causal convolutions and residual connections.
arXiv Detail & Related papers (2025-10-10T09:21:21Z) - Advancing Hearing Assessment: An ASR-Based Frequency-Specific Speech Test for Diagnosing Presbycusis [0.0]
Traditional audiometry fails to fully characterize the functional impact of hearing loss on speech understanding.<n>This paper presents the development and simulated evaluation of a novel Automatic Speech Recognition (ASR)-based frequency-specific speech test.
arXiv Detail & Related papers (2025-05-28T11:06:22Z) - Structure-Accurate Medical Image Translation via Dynamic Frequency Balance and Knowledge Guidance [60.33892654669606]
Diffusion model is a powerful strategy to synthesize the required medical images.<n>Existing approaches still suffer from the problem of anatomical structure distortion due to the overfitting of high-frequency information.<n>We propose a novel method based on dynamic frequency balance and knowledge guidance.
arXiv Detail & Related papers (2025-04-13T05:48:13Z) - Comparative Analysis of Mel-Frequency Cepstral Coefficients and Wavelet Based Audio Signal Processing for Emotion Detection and Mental Health Assessment in Spoken Speech [0.0]
This study explores the application of Convolutional Neural Network (CNN) and Long Short-Term Memory (LSTM) models on wavelet extracted features and Mel-frequency Cepstral Coefficients (MFCCs) for emotion detection from spoken speech.<n>Data augmentation techniques, feature extraction, normalization, and model training were conducted to evaluate the models' performance in classifying emotional states.
arXiv Detail & Related papers (2024-12-12T22:55:11Z) - Voice Disorder Analysis: a Transformer-based Approach [10.003909936239742]
This paper proposes a novel solution that adopts transformers directly working on raw voice signals.
We consider many recording types at the same time, such as sentence reading and sustained vowel emission.
The experimental results, obtained on both public and private datasets, show the effectiveness of our solution in the disorder detection and classification tasks.
arXiv Detail & Related papers (2024-06-20T19:29:04Z) - Non-destructive Fault Diagnosis of Electronic Interconnects by Learning Signal Patterns of Reflection Coefficient in the Frequency Domain [1.8843687952462742]
We propose a novel, non-destructive approach for early fault detection and accurate diagnosis of interconnect defects.
Our approach utilizes the signal patterns of the coefficient reflection across a range of frequencies, enabling both root cause identification and severity assessment.
Experimental results demonstrate that the proposed method is effective for fault detection and diagnosis and has the potential to extend to real-world industrial applications.
arXiv Detail & Related papers (2023-04-20T10:51:21Z) - The role of noise in denoising models for anomaly detection in medical
images [62.0532151156057]
Pathological brain lesions exhibit diverse appearance in brain images.
Unsupervised anomaly detection approaches have been proposed using only normal data for training.
We show that optimization of the spatial resolution and magnitude of the noise improves the performance of different model training regimes.
arXiv Detail & Related papers (2023-01-19T21:39:38Z) - Preservation of High Frequency Content for Deep Learning-Based Medical
Image Classification [74.84221280249876]
An efficient analysis of large amounts of chest radiographs can aid physicians and radiologists.
We propose a novel Discrete Wavelet Transform (DWT)-based method for the efficient identification and encoding of visual information.
arXiv Detail & Related papers (2022-05-08T15:29:54Z) - Deep Metric Learning with Locality Sensitive Angular Loss for
Self-Correcting Source Separation of Neural Spiking Signals [77.34726150561087]
We propose a methodology based on deep metric learning to address the need for automated post-hoc cleaning and robust separation filters.
We validate this method with an artificially corrupted label set based on source-separated high-density surface electromyography recordings.
This approach enables a neural network to learn to accurately decode neurophysiological time series using any imperfect method of labelling the signal.
arXiv Detail & Related papers (2021-10-13T21:51:56Z) - Heart Sound Classification Considering Additive Noise and Convolutional
Distortion [2.63046959939306]
Automatic analysis of heart sounds for abnormality detection is faced with the challenges of additive noise and sensor-dependent degradation.
This paper aims to develop methods to address the cardiac abnormality detection problem when both types of distortions are present in the cardiac auscultation sound.
The proposed method paves the way towards developing computer-aided cardiac auscultation systems in noisy environments using low-cost stethoscopes.
arXiv Detail & Related papers (2021-06-03T14:09:04Z) - Discriminative Singular Spectrum Classifier with Applications on
Bioacoustic Signal Recognition [67.4171845020675]
We present a bioacoustic signal classifier equipped with a discriminative mechanism to extract useful features for analysis and classification efficiently.
Unlike current bioacoustic recognition methods, which are task-oriented, the proposed model relies on transforming the input signals into vector subspaces.
The validity of the proposed method is verified using three challenging bioacoustic datasets containing anuran, bee, and mosquito species.
arXiv Detail & Related papers (2021-03-18T11:01:21Z) - Bulbar ALS Detection Based on Analysis of Voice Perturbation and Vibrato [68.97335984455059]
The purpose of this work was to verify the sutability of the sustain vowel phonation test for automatic detection of patients with ALS.
We proposed enhanced procedure for separation of voice signal into fundamental periods that requires for calculation of measurements.
arXiv Detail & Related papers (2020-03-24T12:49:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.