Advanced Framework for Animal Sound Classification With Features Optimization
- URL: http://arxiv.org/abs/2407.03440v1
- Date: Wed, 3 Jul 2024 18:33:47 GMT
- Title: Advanced Framework for Animal Sound Classification With Features Optimization
- Authors: Qiang Yang, Xiuying Chen, Changsheng Ma, Carlos M. Duarte, Xiangliang Zhang,
- Abstract summary: We propose an automated classification framework applicable to general animal sound classification.
Our approach consistently outperforms baseline methods by over 25% in precision, recall, and accuracy.
- Score: 35.2832738406242
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The automatic classification of animal sounds presents an enduring challenge in bioacoustics, owing to the diverse statistical properties of sound signals, variations in recording equipment, and prevalent low Signal-to-Noise Ratio (SNR) conditions. Deep learning models like Convolutional Neural Networks (CNN) and Long Short-Term Memory (LSTM) have excelled in human speech recognition but have not been effectively tailored to the intricate nature of animal sounds, which exhibit substantial diversity even within the same domain. We propose an automated classification framework applicable to general animal sound classification. Our approach first optimizes audio features from Mel-frequency cepstral coefficients (MFCC) including feature rearrangement and feature reduction. It then uses the optimized features for the deep learning model, i.e., an attention-based Bidirectional LSTM (Bi-LSTM), to extract deep semantic features for sound classification. We also contribute an animal sound benchmark dataset encompassing oceanic animals and birds1. Extensive experimentation with real-world datasets demonstrates that our approach consistently outperforms baseline methods by over 25% in precision, recall, and accuracy, promising advancements in animal sound classification.
Related papers
- On the Utility of Speech and Audio Foundation Models for Marmoset Call Analysis [19.205671029694074]
This study assesses feature representations derived from speech and general audio domains, across pre-training bandwidths of 4, 8, and 16 kHz for marmoset call-type and caller classification tasks.
Results show that models with higher bandwidth improve performance, and pre-training on speech or general audio yields comparable results, improving over a spectral baseline.
arXiv Detail & Related papers (2024-07-23T12:00:44Z) - WhaleNet: a Novel Deep Learning Architecture for Marine Mammals Vocalizations on Watkins Marine Mammal Sound Database [49.1574468325115]
We introduce textbfWhaleNet (Wavelet Highly Adaptive Learning Ensemble Network), a sophisticated deep ensemble architecture for the classification of marine mammal vocalizations.
We achieve an improvement in classification accuracy by $8-10%$ over existing architectures, corresponding to a classification accuracy of $97.61%$.
arXiv Detail & Related papers (2024-02-20T11:36:23Z) - Exploring Meta Information for Audio-based Zero-shot Bird Classification [113.17261694996051]
This study investigates how meta-information can improve zero-shot audio classification.
We use bird species as an example case study due to the availability of rich and diverse meta-data.
arXiv Detail & Related papers (2023-09-15T13:50:16Z) - Improving Primate Sounds Classification using Binary Presorting for Deep
Learning [6.044912425856236]
In this work, we introduce a generalized approach that first relabels subsegments of MEL spectrogram representations.
For both the binary pre-sorting and the classification, we make use of convolutional neural networks (CNN) and various data-augmentation techniques.
We showcase the results of this approach on the challenging textitComparE 2021 dataset, with the task of classifying between different primate species sounds.
arXiv Detail & Related papers (2023-06-28T09:35:09Z) - Self-supervised models of audio effectively explain human cortical
responses to speech [71.57870452667369]
We capitalize on the progress of self-supervised speech representation learning to create new state-of-the-art models of the human auditory system.
We show that these results show that self-supervised models effectively capture the hierarchy of information relevant to different stages of speech processing in human cortex.
arXiv Detail & Related papers (2022-05-27T22:04:02Z) - A Comparative Study on Approaches to Acoustic Scene Classification using
CNNs [0.0]
Different kinds of representations have dramatic effects on the accuracy of the classification.
We investigated the spectrograms, MFCCs, and embeddings representations using different CNN networks and autoencoders.
We found that the spectrogram representation has the highest classification accuracy while MFCC has the lowest classification accuracy.
arXiv Detail & Related papers (2022-04-26T09:23:29Z) - Training Classifiers that are Universally Robust to All Label Noise
Levels [91.13870793906968]
Deep neural networks are prone to overfitting in the presence of label noise.
We propose a distillation-based framework that incorporates a new subcategory of Positive-Unlabeled learning.
Our framework generally outperforms at medium to high noise levels.
arXiv Detail & Related papers (2021-05-27T13:49:31Z) - Discriminative Singular Spectrum Classifier with Applications on
Bioacoustic Signal Recognition [67.4171845020675]
We present a bioacoustic signal classifier equipped with a discriminative mechanism to extract useful features for analysis and classification efficiently.
Unlike current bioacoustic recognition methods, which are task-oriented, the proposed model relies on transforming the input signals into vector subspaces.
The validity of the proposed method is verified using three challenging bioacoustic datasets containing anuran, bee, and mosquito species.
arXiv Detail & Related papers (2021-03-18T11:01:21Z) - A Multi-view CNN-based Acoustic Classification System for Automatic
Animal Species Identification [42.119250432849505]
We propose a deep learning based acoustic classification framework for Wireless Acoustic Sensor Network (WASN)
The proposed framework is based on cloud architecture which relaxes the computational burden on the wireless sensor node.
To improve the recognition accuracy, we design a multi-view Convolution Neural Network (CNN) to extract the short-, middle-, and long-term dependencies in parallel.
arXiv Detail & Related papers (2020-02-23T03:51:08Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.