EdgeEar: Efficient and Accurate Ear Recognition for Edge Devices
- URL: http://arxiv.org/abs/2502.07734v1
- Date: Tue, 11 Feb 2025 17:53:33 GMT
- Title: EdgeEar: Efficient and Accurate Ear Recognition for Edge Devices
- Authors: Camile Lendering, Bernardo Perrone Ribeiro, Žiga Emeršič, Peter Peer,
- Abstract summary: Ear recognition is a contactless and unobtrusive biometric technique with applications across various domains.<n>This paper introduces EdgeEar, a lightweight model based on a proposed hybrid CNN-transformer architecture to solve this problem.<n>By incorporating low-rank approximations into specific linear layers, EdgeEar reduces its parameter count by a factor of 50 compared to the current state-of-the-art, bringing it below two million while maintaining competitive accuracy.
- Score: 0.6200919793965987
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Ear recognition is a contactless and unobtrusive biometric technique with applications across various domains. However, deploying high-performing ear recognition models on resource-constrained devices is challenging, limiting their applicability and widespread adoption. This paper introduces EdgeEar, a lightweight model based on a proposed hybrid CNN-transformer architecture to solve this problem. By incorporating low-rank approximations into specific linear layers, EdgeEar reduces its parameter count by a factor of 50 compared to the current state-of-the-art, bringing it below two million while maintaining competitive accuracy. Evaluation on the Unconstrained Ear Recognition Challenge (UERC2023) benchmark shows that EdgeEar achieves the lowest EER while significantly reducing computational costs. These findings demonstrate the feasibility of efficient and accurate ear recognition, which we believe will contribute to the wider adoption of ear biometrics.
Related papers
- Geometry-Aware Optimization for Respiratory Sound Classification: Enhancing Sensitivity with SAM-Optimized Audio Spectrogram Transformers [0.0]
We introduce a framework that enhances the Audio Spectrogram Transformer (AST) using Sharpness-Aware Minimization (SAM)<n>Our method achieves a state-of-the-art score of 68.10% on the ICBHI 2017 dataset, outperforming existing CNN and hybrid baselines.<n>Further analysis using t-SNE and attention maps confirms that the model learns robust, discriminative features rather than memorizing background noise.
arXiv Detail & Related papers (2025-12-27T11:39:36Z) - Neural Edge Histogram Descriptors for Underwater Acoustic Target Recognition [42.23422932643755]
This work adapts the neural edge histogram descriptors (NEHD) method originally developed for image classification, to classify passive sonar signals.
We conduct a comprehensive evaluation of statistical and structural texture features, demonstrating that their combination achieves competitive performance with large pre-trained models.
The proposed NEHD-based approach offers a lightweight and efficient solution for underwater target recognition, significantly reducing computational costs while maintaining accuracy.
arXiv Detail & Related papers (2025-03-17T22:57:05Z) - KAD: No More FAD! An Effective and Efficient Evaluation Metric for Audio Generation [5.499862297916013]
Kernel Audio Distance (KAD) is a distribution-free, unbiased, and computationally efficient metric based on Maximum Mean Discrepancy (MMD)
By leveraging advanced embeddings and characteristic kernels, KAD captures nuanced differences between real and generated audio.
Open-sourced in the kadtk toolkit, KAD provides an efficient, reliable, and perceptually aligned benchmark for evaluating generative audio models.
arXiv Detail & Related papers (2025-02-21T17:19:15Z) - Robust Persian Digit Recognition in Noisy Environments Using Hybrid CNN-BiGRU Model [1.5566524830295307]
This study addresses isolated spoken Persian digit recognition (zero to nine) under noisy conditions.<n>A hybrid model combining residual convolutional neural networks and bidirectional gated units (BiGRU) is proposed.<n> Experimental results demonstrate the model's effectiveness, achieving 98.53%, 96.10%, and 95.92% accuracy on training, validation, and test sets.
arXiv Detail & Related papers (2024-12-14T15:11:42Z) - Ear-Keeper: Real-time Diagnosis of Ear Lesions Utilizing Ultralight-Ultrafast ConvNet and Large-scale Ear Endoscopic Dataset [7.5179664143779075]
We propose Best-EarNet, an ultrafast and ultralight network enabling real-time ear disease diagnosis.
The accuracy of Best-EarNet with only 0.77M parameters achieves 95.23% (internal 22,581 images) and 92.14% (external 1,652 images)
Ear-Keeper, an intelligent diagnosis system based Best-EarNet, was developed successfully and deployed on common electronic devices.
arXiv Detail & Related papers (2023-08-21T10:20:46Z) - Adaptive Fake Audio Detection with Low-Rank Model Squeezing [50.7916414913962]
Traditional approaches, such as finetuning, are computationally intensive and pose a risk of impairing the acquired knowledge of known fake audio types.
We introduce the concept of training low-rank adaptation matrices tailored specifically to the newly emerging fake audio types.
Our approach offers several advantages, including reduced storage memory requirements and lower equal error rates.
arXiv Detail & Related papers (2023-06-08T06:06:42Z) - Exploiting Cross Domain Acoustic-to-articulatory Inverted Features For
Disordered Speech Recognition [57.15942628305797]
Articulatory features are invariant to acoustic signal distortion and have been successfully incorporated into automatic speech recognition systems for normal speech.
This paper presents a cross-domain acoustic-to-articulatory (A2A) inversion approach that utilizes the parallel acoustic-articulatory data of the 15-hour TORGO corpus in model training.
Cross-domain adapted to the 102.7-hour UASpeech corpus and to produce articulatory features.
arXiv Detail & Related papers (2022-03-19T08:47:18Z) - A Lottery Ticket Hypothesis Framework for Low-Complexity Device-Robust
Neural Acoustic Scene Classification [78.04177357888284]
We propose a novel neural model compression strategy combining data augmentation, knowledge transfer, pruning, and quantization for device-robust acoustic scene classification (ASC)
We report an efficient joint framework for low-complexity multi-device ASC, called Acoustic Lottery.
arXiv Detail & Related papers (2021-07-03T16:25:24Z) - WNARS: WFST based Non-autoregressive Streaming End-to-End Speech
Recognition [59.975078145303605]
We propose a novel framework, namely WNARS, using hybrid CTC-attention AED models and weighted finite-state transducers.
On the AISHELL-1 task, our WNARS achieves a character error rate of 5.22% with 640ms latency, to the best of our knowledge, which is the state-of-the-art performance for online ASR.
arXiv Detail & Related papers (2021-04-08T07:56:03Z) - UESegNet: Context Aware Unconstrained ROI Segmentation Networks for Ear
Biometric [8.187718963808484]
ear biometrics possess a great level of difficulties in the unconstrained environment.
To address the problem of ear localization in the wild, we have proposed two high-performance region of interest (ROI) segmentation models UESegNet-1 and UESegNet-2.
To test the model's generalization, they are evaluated on six different benchmark datasets.
arXiv Detail & Related papers (2020-10-08T14:05:15Z) - ALF: Autoencoder-based Low-rank Filter-sharing for Efficient
Convolutional Neural Networks [63.91384986073851]
We propose the autoencoder-based low-rank filter-sharing technique technique (ALF)
ALF shows a reduction of 70% in network parameters, 61% in operations and 41% in execution time, with minimal loss in accuracy.
arXiv Detail & Related papers (2020-07-27T09:01:22Z) - Deep Speaker Embeddings for Far-Field Speaker Recognition on Short
Utterances [53.063441357826484]
Speaker recognition systems based on deep speaker embeddings have achieved significant performance in controlled conditions.
Speaker verification on short utterances in uncontrolled noisy environment conditions is one of the most challenging and highly demanded tasks.
This paper presents approaches aimed to achieve two goals: a) improve the quality of far-field speaker verification systems in the presence of environmental noise, reverberation and b) reduce the system qualitydegradation for short utterances.
arXiv Detail & Related papers (2020-02-14T13:34:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.