Multimodal Laryngoscopic Video Analysis for Assisted Diagnosis of Vocal Cord Paralysis
- URL: http://arxiv.org/abs/2409.03597v1
- Date: Thu, 5 Sep 2024 14:56:38 GMT
- Title: Multimodal Laryngoscopic Video Analysis for Assisted Diagnosis of Vocal Cord Paralysis
- Authors: Yucong Zhang, Xin Zou, Jinshan Yang, Wenjun Chen, Faya Liang, Ming Li,
- Abstract summary: The Multimodal Analyzing System for Laryngoscope (MASL) combines audio and video data to automatically extract key segments and metrics from laryngeal videostroboscopic videos for clinical assessment.
MASL integrates glottis detection with keyword spotting to analyze patient vocalizations and refine video highlights for better inspection of vocal cord movements.
- Score: 7.583632364503357
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This paper presents the Multimodal Analyzing System for Laryngoscope (MASL), a system that combines audio and video data to automatically extract key segments and metrics from laryngeal videostroboscopic videos for clinical assessment. MASL integrates glottis detection with keyword spotting to analyze patient vocalizations and refine video highlights for better inspection of vocal cord movements. The system includes a strobing video extraction module that identifies frames by analyzing hue, saturation, and value fluctuations. MASL also provides effective metrics for vocal cord paralysis detection, employing a two-stage glottis segmentation process using U-Net followed by diffusion-based refinement to reduce false positives. Instead of glottal area waveforms, MASL estimates anterior glottic angle waveforms (AGAW) from glottis masks, evaluating both left and right vocal cords to detect unilateral vocal cord paralysis (UVFP). By comparing AGAW variances, MASL distinguishes between left and right paralysis. Ablation studies and experiments on public and real-world datasets validate MASL's segmentation module and demonstrate its ability to provide reliable metrics for UVFP diagnosis.
Related papers
- MMSummary: Multimodal Summary Generation for Fetal Ultrasound Video [13.231546105751015]
We present the first automated multimodal generation, MMSummary, for medical imaging video, particularly with a focus on fetal ultrasound analysis.
MMSummary is designed as a three-stage pipeline, progressing from anatomy detection to captioning and finally segmentation and measurement.
Based on reported experiments is estimated to reduce scanning time by approximately 31.5%, thereby suggesting the potential to enhance workflow efficiency.
arXiv Detail & Related papers (2024-08-07T13:30:58Z) - Dr-LLaVA: Visual Instruction Tuning with Symbolic Clinical Grounding [53.629132242389716]
Vision-Language Models (VLM) can support clinicians by analyzing medical images and engaging in natural language interactions.
VLMs often exhibit "hallucinogenic" behavior, generating textual outputs not grounded in contextual multimodal information.
We propose a new alignment algorithm that uses symbolic representations of clinical reasoning to ground VLMs in medical knowledge.
arXiv Detail & Related papers (2024-05-29T23:19:28Z) - DiffVein: A Unified Diffusion Network for Finger Vein Segmentation and
Authentication [50.017055360261665]
We introduce DiffVein, a unified diffusion model-based framework which simultaneously addresses vein segmentation and authentication tasks.
For better feature interaction between these two branches, we introduce two specialized modules.
In this way, our framework allows for a dynamic interplay between diffusion and segmentation embeddings.
arXiv Detail & Related papers (2024-02-03T06:49:42Z) - A Unified Approach for Comprehensive Analysis of Various Spectral and
Tissue Doppler Echocardiography [3.7775754350457746]
We introduce a novel unified framework using a convolutional neural network for comprehensive analysis of spectral and tissue Doppler echocardiography images.
The network automatically recognizes key features across various Doppler views, with novel Doppler shape embedding and anti-aliasing modules.
Empirical results indicate a consistent outperformance in performance metrics, including dice similarity coefficients (DSC) and intersection over union (IoU)
arXiv Detail & Related papers (2023-11-14T15:10:05Z) - Show from Tell: Audio-Visual Modelling in Clinical Settings [58.88175583465277]
We consider audio-visual modelling in a clinical setting, providing a solution to learn medical representations without human expert annotation.
A simple yet effective multi-modal self-supervised learning framework is proposed for this purpose.
The proposed approach is able to localise anatomical regions of interest during ultrasound imaging, with only speech audio as a reference.
arXiv Detail & Related papers (2023-10-25T08:55:48Z) - GEMTrans: A General, Echocardiography-based, Multi-Level Transformer
Framework for Cardiovascular Diagnosis [14.737295160286939]
Vision-based machine learning (ML) methods have gained popularity to act as secondary layers of verification.
We propose a General, Echo-based, Multi-Level Transformer (GEMTrans) framework that provides explainability.
We show the flexibility of our framework by considering two critical tasks including ejection fraction (EF) and aortic stenosis (AS) severity detection.
arXiv Detail & Related papers (2023-08-25T07:30:18Z) - Video-TransUNet: Temporally Blended Vision Transformer for CT VFSS
Instance Segmentation [11.575821326313607]
We propose Video-TransUNet, a deep architecture for segmentation in medical CT videos constructed by integrating temporal feature blending into the TransUNet deep learning framework.
In particular, our approach amalgamates strong frame representation via a ResNet CNN backbone, multi-frame feature blending via a Temporal Context Module, and reconstructive capabilities for multiple targets via a UNet-based convolutional-deconal architecture with multiple heads.
arXiv Detail & Related papers (2022-08-17T14:28:58Z) - MS Lesion Segmentation: Revisiting Weighting Mechanisms for Federated
Learning [92.91544082745196]
Federated learning (FL) has been widely employed for medical image analysis.
FL's performance is limited for multiple sclerosis (MS) lesion segmentation tasks.
We propose the first FL MS lesion segmentation framework via two effective re-weighting mechanisms.
arXiv Detail & Related papers (2022-05-03T14:06:03Z) - Audio-visual multi-channel speech separation, dereverberation and
recognition [70.34433820322323]
This paper proposes an audio-visual multi-channel speech separation, dereverberation and recognition approach.
The advantage of the additional visual modality over using audio only is demonstrated on two neural dereverberation approaches.
Experiments conducted on the LRS2 dataset suggest that the proposed audio-visual multi-channel speech separation, dereverberation and recognition system outperforms the baseline.
arXiv Detail & Related papers (2022-04-05T04:16:03Z) - Improving Classification Model Performance on Chest X-Rays through Lung
Segmentation [63.45024974079371]
We propose a deep learning approach to enhance abnormal chest x-ray (CXR) identification performance through segmentations.
Our approach is designed in a cascaded manner and incorporates two modules: a deep neural network with criss-cross attention modules (XLSor) for localizing lung region in CXR images and a CXR classification model with a backbone of a self-supervised momentum contrast (MoCo) model pre-trained on large-scale CXR data sets.
arXiv Detail & Related papers (2022-02-22T15:24:06Z) - Heart Sound Segmentation using Bidirectional LSTMs with Attention [37.62160903348547]
We propose a novel framework for the segmentation of phonocardiogram (PCG) signals into heart states.
We exploit recent advancements in attention based learning to segment the PCG signal.
The proposed method attains state-of-the-art performance on multiple benchmarks including both human and animal heart recordings.
arXiv Detail & Related papers (2020-04-02T02:09:11Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.