An Attentive Dual-Encoder Framework Leveraging Multimodal Visual and Semantic Information for Automatic OSAHS Diagnosis
- URL: http://arxiv.org/abs/2412.18919v1
- Date: Wed, 25 Dec 2024 14:42:17 GMT
- Title: An Attentive Dual-Encoder Framework Leveraging Multimodal Visual and Semantic Information for Automatic OSAHS Diagnosis
- Authors: Yingchen Wei, Xihe Qiu, Xiaoyu Tan, Jingjing Huang, Wei Chu, Yinghui Xu, Yuan Qi,
- Abstract summary: Obstructive sleep apnea-hypopnea syndrome (OSAHS) is a common sleep disorder caused by upper airway blockage, leading to oxygen deprivation and disrupted sleep.<n>Existing deep learning methods using facial image analysis lack accuracy due to poor facial feature capture and limited sample sizes.<n>We propose a multimodal dual encoder model that integrates visual and language inputs for automated OSAHS diagnosis.
- Score: 26.69518726864821
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Obstructive sleep apnea-hypopnea syndrome (OSAHS) is a common sleep disorder caused by upper airway blockage, leading to oxygen deprivation and disrupted sleep. Traditional diagnosis using polysomnography (PSG) is expensive, time-consuming, and uncomfortable. Existing deep learning methods using facial image analysis lack accuracy due to poor facial feature capture and limited sample sizes. To address this, we propose a multimodal dual encoder model that integrates visual and language inputs for automated OSAHS diagnosis. The model balances data using randomOverSampler, extracts key facial features with attention grids, and converts physiological data into meaningful text. Cross-attention combines image and text data for better feature extraction, and ordered regression loss ensures stable learning. Our approach improves diagnostic efficiency and accuracy, achieving 91.3% top-1 accuracy in a four-class severity classification task, demonstrating state-of-the-art performance. Code will be released upon acceptance.
Related papers
- A Semantically Enhanced Generative Foundation Model Improves Pathological Image Synthesis [82.01597026329158]
We introduce a Correlation-Regulated Alignment Framework for Tissue Synthesis (CRAFTS) for pathology-specific text-to-image synthesis.<n>CRAFTS incorporates a novel alignment mechanism that suppresses semantic drift to ensure biological accuracy.<n>This model generates diverse pathological images spanning 30 cancer types, with quality rigorously validated by objective metrics and pathologist evaluations.
arXiv Detail & Related papers (2025-12-15T10:22:43Z) - Diffusion-Based Data Augmentation for Medical Image Segmentation [2.841725244360927]
DiffAug is a novel framework that combines textguided diffusion-based generation and automatic segmentation validation.<n>Our framework achieves state-of-the-art performance with 8-10% Dice improvements over baselines.
arXiv Detail & Related papers (2025-08-25T09:49:27Z) - Exploring the Efficacy of Convolutional Neural Networks in Sleep Apnea Detection from Single Channel EEG [0.0]
This paper presents a novel approach to the detection of sleep apnea using a Convolutional Neural Network (CNN) trained on single channel EEG data.<n>The proposed CNN achieved an accuracy of 85.1% and a Matthews Correlation Coefficient (MCC) of 0.22, demonstrating a significant potential for home based applications.
arXiv Detail & Related papers (2025-08-16T04:58:00Z) - SkinDualGen: Prompt-Driven Diffusion for Simultaneous Image-Mask Generation in Skin Lesions [0.0]
We propose a novel method that leverages the pretrained Stable Diffusion-2.0 model to generate high-quality synthetic skin lesion images.<n>A hybrid dataset combining real and synthetic data markedly enhances the performance of classification and segmentation models.
arXiv Detail & Related papers (2025-07-26T15:00:37Z) - Crane: Context-Guided Prompt Learning and Attention Refinement for Zero-Shot Anomaly Detections [50.343419243749054]
Anomaly Detection (AD) involves identifying deviations from normal data distributions.
We propose a novel approach that conditions the prompts of the text encoder based on image context extracted from the vision encoder.
Our method achieves state-of-the-art performance, improving performance by 2% to 29% across different metrics on 14 datasets.
arXiv Detail & Related papers (2025-04-15T10:42:25Z) - CONSULT: Contrastive Self-Supervised Learning for Few-shot Tumor Detection [21.809270017579806]
We introduce a novel two-stage anomaly detection algorithm called CONSULT (CONtrastive Self-sUpervised Learning for few-shot Tumor detection)
CONSULT fine-tunes a pre-trained feature extractor specifically for MRI brain images, using a synthetic data generation pipeline to create tumor-like data.
The first stage is to overcome the shortcomings of current anomaly detection in extracting features in high-variation data by incorporating Context-Aware Contrastive Learning and Self-supervised Feature Adversarial Learning.
arXiv Detail & Related papers (2024-10-15T06:09:28Z) - Leveraging Latent Diffusion Models for Training-Free In-Distribution Data Augmentation for Surface Defect Detection [9.784793380119806]
We introduce DIAG, a training-free Diffusion-based In-distribution Anomaly Generation pipeline for data augmentation.
Unlike conventional image generation techniques, we implement a human-in-the-loop pipeline, where domain experts provide multimodal guidance to the model.
We demonstrate the efficacy and versatility of DIAG with respect to state-of-the-art data augmentation approaches on the challenging KSDD2 dataset.
arXiv Detail & Related papers (2024-07-04T14:28:52Z) - Adapting Visual-Language Models for Generalizable Anomaly Detection in Medical Images [68.42215385041114]
This paper introduces a novel lightweight multi-level adaptation and comparison framework to repurpose the CLIP model for medical anomaly detection.
Our approach integrates multiple residual adapters into the pre-trained visual encoder, enabling a stepwise enhancement of visual features across different levels.
Our experiments on medical anomaly detection benchmarks demonstrate that our method significantly surpasses current state-of-the-art models.
arXiv Detail & Related papers (2024-03-19T09:28:19Z) - AIOSA: An approach to the automatic identification of obstructive sleep
apnea events based on deep learning [1.5381930379183162]
OSAS is associated with higher mortality, worse neurological deficits, worse functional outcome after rehabilitation, and a higher likelihood of uncontrolled hypertension.
The gold standard test for diagnosing OSAS is polysomnography (PSG)
We propose a convolutional deep learning architecture able to reduce the temporal resolution of raw waveform data.
arXiv Detail & Related papers (2023-02-10T11:21:47Z) - Improving Deep Facial Phenotyping for Ultra-rare Disorder Verification
Using Model Ensembles [52.77024349608834]
We analyze the influence of replacing a DCNN with a state-of-the-art face recognition approach, iResNet with ArcFace.
Our proposed ensemble model achieves state-of-the-art performance on both seen and unseen disorders.
arXiv Detail & Related papers (2022-11-12T23:28:54Z) - Adversarial Distortion Learning for Medical Image Denoising [43.53912137735094]
We present a novel adversarial distortion learning (ADL) for denoising two- and three-dimensional (2D/3D) biomedical image data.
The proposed ADL consists of two auto-encoders: a denoiser and a discriminator.
Both the denoiser and the discriminator are built upon a proposed auto-encoder called Efficient-Unet.
arXiv Detail & Related papers (2022-04-29T13:47:39Z) - Margin-Aware Intra-Class Novelty Identification for Medical Images [2.647674705784439]
We propose a hybrid model - Transformation-based Embedding learning for Novelty Detection (TEND)
With a pre-trained autoencoder as image feature extractor, TEND learns to discriminate the feature embeddings of in-distribution data from the transformed counterparts as fake out-of-distribution inputs.
arXiv Detail & Related papers (2021-07-31T00:10:26Z) - Convolutional Neural Networks for Sleep Stage Scoring on a Two-Channel
EEG Signal [63.18666008322476]
Sleep problems are one of the major diseases all over the world.
Basic tool used by specialists is the Polysomnogram, which is a collection of different signals recorded during sleep.
Specialists have to score the different signals according to one of the standard guidelines.
arXiv Detail & Related papers (2021-03-30T09:59:56Z) - Many-to-One Distribution Learning and K-Nearest Neighbor Smoothing for
Thoracic Disease Identification [83.6017225363714]
deep learning has become the most powerful computer-aided diagnosis technology for improving disease identification performance.
For chest X-ray imaging, annotating large-scale data requires professional domain knowledge and is time-consuming.
In this paper, we propose many-to-one distribution learning (MODL) and K-nearest neighbor smoothing (KNNS) methods to improve a single model's disease identification performance.
arXiv Detail & Related papers (2021-02-26T02:29:30Z) - Improved Slice-wise Tumour Detection in Brain MRIs by Computing
Dissimilarities between Latent Representations [68.8204255655161]
Anomaly detection for Magnetic Resonance Images (MRIs) can be solved with unsupervised methods.
We have proposed a slice-wise semi-supervised method for tumour detection based on the computation of a dissimilarity function in the latent space of a Variational AutoEncoder.
We show that by training the models on higher resolution images and by improving the quality of the reconstructions, we obtain results which are comparable with different baselines.
arXiv Detail & Related papers (2020-07-24T14:02:09Z) - Automate Obstructive Sleep Apnea Diagnosis Using Convolutional Neural
Networks [4.882119124419393]
This paper presents a CNN architecture with 1D convolutional and FCN layers for classification.
The proposed 1D CNN model achieves excellent classification results without manually preprocesssing PSG signals.
arXiv Detail & Related papers (2020-06-13T15:35:18Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.