Ear-Keeper: Real-time Diagnosis of Ear Lesions Utilizing Ultralight-Ultrafast ConvNet and Large-scale Ear Endoscopic Dataset
- URL: http://arxiv.org/abs/2308.10610v4
- Date: Wed, 10 Apr 2024 08:16:18 GMT
- Title: Ear-Keeper: Real-time Diagnosis of Ear Lesions Utilizing Ultralight-Ultrafast ConvNet and Large-scale Ear Endoscopic Dataset
- Authors: Yubiao Yue, Xinyu Zeng, Xiaoqiang Shi, Meiping Zhang, Fan Zhang, Yunxin Liang, Yan Liu, Zhenzhang Li, Yang Li,
- Abstract summary: We propose Best-EarNet, an ultrafast and ultralight network enabling real-time ear disease diagnosis.
The accuracy of Best-EarNet with only 0.77M parameters achieves 95.23% (internal 22,581 images) and 92.14% (external 1,652 images)
Ear-Keeper, an intelligent diagnosis system based Best-EarNet, was developed successfully and deployed on common electronic devices.
- Score: 7.5179664143779075
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Deep learning-based ear disease diagnosis technology has proven effective and affordable. However, due to the lack of ear endoscope datasets with diversity, the practical potential of the deep learning model has not been thoroughly studied. Moreover, existing research failed to achieve a good trade-off between model inference speed and parameter size, rendering models inapplicable in real-world settings. To address these challenges, we constructed the first large-scale ear endoscopic dataset comprising eight types of ear diseases and disease-free samples from two institutions. Inspired by ShuffleNetV2, we proposed Best-EarNet, an ultrafast and ultralight network enabling real-time ear disease diagnosis. Best-EarNet incorporates a novel Local-Global Spatial Feature Fusion Module and multi-scale supervision strategy, which facilitates the model focusing on global-local information within feature maps at various levels. Utilizing transfer learning, the accuracy of Best-EarNet with only 0.77M parameters achieves 95.23% (internal 22,581 images) and 92.14% (external 1,652 images), respectively. In particular, it achieves an average frame per second of 80 on the CPU. From the perspective of model practicality, the proposed Best-EarNet is superior to state-of-the-art backbone models in ear lesion detection tasks. Most importantly, Ear-keeper, an intelligent diagnosis system based Best-EarNet, was developed successfully and deployed on common electronic devices (smartphone, tablet computer and personal computer). In the future, Ear-Keeper has the potential to assist the public and healthcare providers in performing comprehensive scanning and diagnosis of the ear canal in real-time video, thereby promptly detecting ear lesions.
Related papers
- A Fully Open and Generalizable Foundation Model for Ultrasound Clinical Applications [77.3888788549565]
We present EchoCare, a novel ultrasound foundation model for generalist clinical use.<n>We developed EchoCare via self-supervised learning on our curated, publicly available, large-scale dataset EchoCareData.<n>With minimal training, EchoCare outperforms state-of-the-art comparison models across 10 representative ultrasound benchmarks.
arXiv Detail & Related papers (2025-09-15T10:05:31Z) - Unified Multi-task Learning for Voice-Based Detection of Diverse Clinical Conditions [14.745982411183766]
We present MARVEL, a privacy-conscious multitask learning framework that simultaneously detects nine distinct neurological, respiratory, and voice disorders.<n>Our framework consistently outperforms single-modal baselines by 5-19% and surpasses state-of-the-art self-supervised models on 7 of 9 tasks.
arXiv Detail & Related papers (2025-08-28T12:37:25Z) - UltraEar: a multicentric, large-scale database combining ultra-high-resolution computed tomography and clinical data for ear diseases [28.75872046719716]
UltraEar recruits patients from 11 tertiary hospitals between October 2020 and October 2035.<n>UltraEar recruits patients from 11 tertiary hospitals between October 2020 and October 2035.<n>A broad spectrum of otologic disorders is covered, such as otitis media, cholatoma, ossicular chain malformation, temporal bone fracture, inner ear malformation, cochlear aperture stenosis, enlarged vestibular aqueduct, and sigmoid sinus bony deficiency.
arXiv Detail & Related papers (2025-08-27T05:56:17Z) - Deploying and Evaluating Multiple Deep Learning Models on Edge Devices for Diabetic Retinopathy Detection [0.0]
Diabetic Retinopathy (DR) affects approximately 34.6% of diabetes patients globally, with the number of cases projected to reach 242 million by 2045.<n>Traditional DR diagnosis relies on the manual examination of retinal fundus images, which is both time-consuming and resource intensive.<n>This study presents a novel solution using Edge Impulse to deploy multiple deep learning models for real-time DR on edge devices.
arXiv Detail & Related papers (2025-06-14T13:53:45Z) - Detection of Disease on Nasal Breath Sound by New Lightweight Architecture: Using COVID-19 as An Example [4.618578603062536]
Infectious diseases, particularly COVID-19, continue to be a significant global health issue.
This study aims to develop a novel, lightweight deep neural network for efficient, accurate, and cost-effective detection of COVID-19 using a nasal breathing audio data collected via smartphones.
arXiv Detail & Related papers (2025-04-01T12:41:53Z) - Autonomous AI for Multi-Pathology Detection in Chest X-Rays: A Multi-Site Study in the Indian Healthcare System [0.0]
The study outlines the development of an autonomous AI system for chest X-ray (CXR) interpretation.<n>The system integrates advanced architectures including Vision Transformers, Faster R-CNN, and various U Net models.<n>It was deployed in 17 major healthcare systems in India including diagnostic centers, large hospitals, and government hospitals.
arXiv Detail & Related papers (2025-03-28T09:07:17Z) - Congenital Heart Disease Classification Using Phonocardiograms: A Scalable Screening Tool for Diverse Environments [34.10187730651477]
Congenital heart disease (CHD) is a critical condition that demands early detection.<n>This study presents a deep learning model designed to detect CHD using phonocardiogram (PCG) signals.<n>We evaluated our model on several datasets, including the primary dataset from Bangladesh.
arXiv Detail & Related papers (2025-03-28T05:47:44Z) - AI-Driven MRI Spine Pathology Detection: A Comprehensive Deep Learning Approach for Automated Diagnosis in Diverse Clinical Settings [0.0]
This study presents the development of an autonomous AI system for MRI spine pathology detection, trained on a dataset of 2 million MRI spine scans.<n>The dataset is balanced across age groups, genders, and scanner manufacturers to ensure robustness and adaptability.<n>The system was deployed across 13 major healthcare enterprises in India, encompassing diagnostic centers, large hospitals, and government facilities.
arXiv Detail & Related papers (2025-03-26T08:33:03Z) - GONet: A Generalizable Deep Learning Model for Glaucoma Detection [2.0521974107551535]
Glaucomatous optic neuropathy (GON) is a prevalent ocular disease that can lead to irreversible vision loss if not detected early and treated.
Recent deep learning models for automating GON detection from digital fundus images have shown promise but often suffer from limited generalizability.
We introduce GONet, a robust deep learning model developed using seven independent datasets.
arXiv Detail & Related papers (2025-02-26T19:28:09Z) - Privacy-Preserving Federated Foundation Model for Generalist Ultrasound Artificial Intelligence [83.02106623401885]
We present UltraFedFM, an innovative privacy-preserving ultrasound foundation model.
UltraFedFM is collaboratively pre-trained using federated learning across 16 distributed medical institutions in 9 countries.
It achieves an average area under the receiver operating characteristic curve of 0.927 for disease diagnosis and a dice similarity coefficient of 0.878 for lesion segmentation.
arXiv Detail & Related papers (2024-11-25T13:40:11Z) - MADE-for-ASD: A Multi-Atlas Deep Ensemble Network for Diagnosing Autism Spectrum Disorder [4.7377709803078325]
This paper bridges the gap between traditional, time-consuming diagnostic methods and potential automated solutions.
We propose a multi-atlas deep ensemble network, MADE-for-ASD, that integrates multiple atlases of the brain's functional magnetic resonance imaging (fMRI) data.
Our approach integrates demographic information into the prediction workflow, which enhances ASD diagnosis performance.
arXiv Detail & Related papers (2024-07-09T17:49:23Z) - Towards a clinically accessible radiology foundation model: open-access and lightweight, with automated evaluation [113.5002649181103]
Training open-source small multimodal models (SMMs) to bridge competency gaps for unmet clinical needs in radiology.
For training, we assemble a large dataset of over 697 thousand radiology image-text pairs.
For evaluation, we propose CheXprompt, a GPT-4-based metric for factuality evaluation, and demonstrate its parity with expert evaluation.
The inference of LlaVA-Rad is fast and can be performed on a single V100 GPU in private settings, offering a promising state-of-the-art tool for real-world clinical applications.
arXiv Detail & Related papers (2024-03-12T18:12:02Z) - Detecting Speech Abnormalities with a Perceiver-based Sequence
Classifier that Leverages a Universal Speech Model [4.503292461488901]
We propose a Perceiver-based sequence to detect abnormalities in speech reflective of several neurological disorders.
We combine this sequence with a Universal Speech Model (USM) that is trained (unsupervised) on 12 million hours of diverse audio recordings.
Our model outperforms standard transformer (80.9%) and perceiver (81.8%) models and achieves an average accuracy of 83.1%.
arXiv Detail & Related papers (2023-10-16T21:07:12Z) - UniBrain: Universal Brain MRI Diagnosis with Hierarchical
Knowledge-enhanced Pre-training [66.16134293168535]
We propose a hierarchical knowledge-enhanced pre-training framework for the universal brain MRI diagnosis, termed as UniBrain.
Specifically, UniBrain leverages a large-scale dataset of 24,770 imaging-report pairs from routine diagnostics.
arXiv Detail & Related papers (2023-09-13T09:22:49Z) - The role of noise in denoising models for anomaly detection in medical
images [62.0532151156057]
Pathological brain lesions exhibit diverse appearance in brain images.
Unsupervised anomaly detection approaches have been proposed using only normal data for training.
We show that optimization of the spatial resolution and magnitude of the noise improves the performance of different model training regimes.
arXiv Detail & Related papers (2023-01-19T21:39:38Z) - A Meta-GNN approach to personalized seizure detection and classification [53.906130332172324]
We propose a personalized seizure detection and classification framework that quickly adapts to a specific patient from limited seizure samples.
We train a Meta-GNN based classifier that learns a global model from a set of training patients.
We show that our method outperforms the baselines by reaching 82.7% on accuracy and 82.08% on F1 score after only 20 iterations on new unseen patients.
arXiv Detail & Related papers (2022-11-01T14:12:58Z) - Exploring linguistic feature and model combination for speech
recognition based automatic AD detection [61.91708957996086]
Speech based automatic AD screening systems provide a non-intrusive and more scalable alternative to other clinical screening techniques.
Scarcity of specialist data leads to uncertainty in both model selection and feature learning when developing such systems.
This paper investigates the use of feature and model combination approaches to improve the robustness of domain fine-tuning of BERT and Roberta pre-trained text encoders.
arXiv Detail & Related papers (2022-06-28T05:09:01Z) - Exploiting Cross-domain And Cross-Lingual Ultrasound Tongue Imaging
Features For Elderly And Dysarthric Speech Recognition [55.25565305101314]
Articulatory features are invariant to acoustic signal distortion and have been successfully incorporated into automatic speech recognition systems.
This paper presents a cross-domain and cross-lingual A2A inversion approach that utilizes the parallel audio and ultrasound tongue imaging (UTI) data of the 24-hour TaL corpus in A2A model pre-training.
Experiments conducted on three tasks suggested incorporating the generated articulatory features consistently outperformed the baseline TDNN and Conformer ASR systems.
arXiv Detail & Related papers (2022-06-15T07:20:28Z) - Side-aware Meta-Learning for Cross-Dataset Listener Diagnosis with
Subjective Tinnitus [38.66127142638335]
This paper proposes a side-aware meta-learning for cross-dataset tinnitus diagnosis.
Our method achieves a high accuracy of 73.8% in the cross-dataset classification.
arXiv Detail & Related papers (2022-05-03T03:17:44Z) - Multiple Time Series Fusion Based on LSTM An Application to CAP A Phase
Classification Using EEG [56.155331323304]
Deep learning based electroencephalogram channels' feature level fusion is carried out in this work.
Channel selection, fusion, and classification procedures were optimized by two optimization algorithms.
arXiv Detail & Related papers (2021-12-18T14:17:49Z) - Novel EEG based Schizophrenia Detection with IoMT Framework for Smart
Healthcare [0.0]
Schizophrenia(Sz) is a brain disorder that severely affects the thinking, behaviour, and feelings of people all around the world.
EEG is a non-linear time-seriesi signal and utilizing it for investigation is rather crucial due to its non-linear structure.
This paper aims to improve the performance of EEG based Sz detection using a deep learning approach.
arXiv Detail & Related papers (2021-11-19T18:21:20Z) - Detecting COVID-19 from Breathing and Coughing Sounds using Deep Neural
Networks [68.8204255655161]
We adapt an ensemble of Convolutional Neural Networks to classify if a speaker is infected with COVID-19 or not.
Ultimately, it achieves an Unweighted Average Recall (UAR) of 74.9%, or an Area Under ROC Curve (AUC) of 80.7% by ensembling neural networks.
arXiv Detail & Related papers (2020-12-29T01:14:17Z) - UESegNet: Context Aware Unconstrained ROI Segmentation Networks for Ear
Biometric [8.187718963808484]
ear biometrics possess a great level of difficulties in the unconstrained environment.
To address the problem of ear localization in the wild, we have proposed two high-performance region of interest (ROI) segmentation models UESegNet-1 and UESegNet-2.
To test the model's generalization, they are evaluated on six different benchmark datasets.
arXiv Detail & Related papers (2020-10-08T14:05:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.