An Ensemble of Convolutional Neural Networks for Audio Classification
- URL: http://arxiv.org/abs/2007.07966v2
- Date: Tue, 27 Apr 2021 22:34:00 GMT
- Title: An Ensemble of Convolutional Neural Networks for Audio Classification
- Authors: Loris Nanni, Gianluca Maguolo, Sheryl Brahnam, Michelangelo Paci
- Abstract summary: ensembles of CNNs for audio classification are presented and tested on three freely available audio classification datasets.
To the best of our knowledge, this is the most extensive study investigating ensembles of CNNs for audio classification.
- Score: 9.174145063580882
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In this paper, ensembles of classifiers that exploit several data
augmentation techniques and four signal representations for training
Convolutional Neural Networks (CNNs) for audio classification are presented and
tested on three freely available audio classification datasets: i) bird calls,
ii) cat sounds, and iii) the Environmental Sound Classification dataset. The
best performing ensembles combining data augmentation techniques with different
signal representations are compared and shown to outperform the best methods
reported in the literature on these datasets. The approach proposed here
obtains state-of-the-art results in the widely used ESC-50 dataset. To the best
of our knowledge, this is the most extensive study investigating ensembles of
CNNs for audio classification. Results demonstrate not only that CNNs can be
trained for audio classification but also that their fusion using different
techniques works better than the stand-alone classifiers.
Related papers
- Audio Classification of Low Feature Spectrograms Utilizing Convolutional Neural Networks [0.0]
This paper derives several first-of-its-kind machine learning methodologies to analyze low feature audio spectrograms given data distributions.
In particular, this paper proposes several novel customized convolutional architectures to extract identifying features using binary, one-class, and siamese approaches.
arXiv Detail & Related papers (2024-10-28T21:48:57Z) - Real Acoustic Fields: An Audio-Visual Room Acoustics Dataset and Benchmark [65.79402756995084]
Real Acoustic Fields (RAF) is a new dataset that captures real acoustic room data from multiple modalities.
RAF is the first dataset to provide densely captured room acoustic data.
arXiv Detail & Related papers (2024-03-27T17:59:56Z) - Exploring Meta Information for Audio-based Zero-shot Bird Classification [113.17261694996051]
This study investigates how meta-information can improve zero-shot audio classification.
We use bird species as an example case study due to the availability of rich and diverse meta-data.
arXiv Detail & Related papers (2023-09-15T13:50:16Z) - Text-to-feature diffusion for audio-visual few-shot learning [59.45164042078649]
Few-shot learning from video data is a challenging and underexplored, yet much cheaper, setup.
We introduce a unified audio-visual few-shot video classification benchmark on three datasets.
We show that AV-DIFF obtains state-of-the-art performance on our proposed benchmark for audio-visual few-shot learning.
arXiv Detail & Related papers (2023-09-07T17:30:36Z) - Improving Primate Sounds Classification using Binary Presorting for Deep
Learning [6.044912425856236]
In this work, we introduce a generalized approach that first relabels subsegments of MEL spectrogram representations.
For both the binary pre-sorting and the classification, we make use of convolutional neural networks (CNN) and various data-augmentation techniques.
We showcase the results of this approach on the challenging textitComparE 2021 dataset, with the task of classifying between different primate species sounds.
arXiv Detail & Related papers (2023-06-28T09:35:09Z) - Decoupled Mixup for Generalized Visual Recognition [71.13734761715472]
We propose a novel "Decoupled-Mixup" method to train CNN models for visual recognition.
Our method decouples each image into discriminative and noise-prone regions, and then heterogeneously combines these regions to train CNN models.
Experiment results show the high generalization performance of our method on testing data that are composed of unseen contexts.
arXiv Detail & Related papers (2022-10-26T15:21:39Z) - A Comparative Study on Approaches to Acoustic Scene Classification using
CNNs [0.0]
Different kinds of representations have dramatic effects on the accuracy of the classification.
We investigated the spectrograms, MFCCs, and embeddings representations using different CNN networks and autoencoders.
We found that the spectrogram representation has the highest classification accuracy while MFCC has the lowest classification accuracy.
arXiv Detail & Related papers (2022-04-26T09:23:29Z) - No Fear of Heterogeneity: Classifier Calibration for Federated Learning
with Non-IID Data [78.69828864672978]
A central challenge in training classification models in the real-world federated system is learning with non-IID data.
We propose a novel and simple algorithm called Virtual Representations (CCVR), which adjusts the classifier using virtual representations sampled from an approximated ssian mixture model.
Experimental results demonstrate that CCVR state-of-the-art performance on popular federated learning benchmarks including CIFAR-10, CIFAR-100, and CINIC-10.
arXiv Detail & Related papers (2021-06-09T12:02:29Z) - SoundCLR: Contrastive Learning of Representations For Improved
Environmental Sound Classification [0.6767885381740952]
SoundCLR is a supervised contrastive learning method for effective environment sound classification with state-of-the-art performance.
Due to the comparatively small sizes of the available environmental sound datasets, we propose and exploit a transfer learning and strong data augmentation pipeline.
Our experiments show that our masking based augmentation technique on the log-mel spectrograms can significantly improve the recognition performance.
arXiv Detail & Related papers (2021-03-02T18:42:45Z) - A Two-Stage Approach to Device-Robust Acoustic Scene Classification [63.98724740606457]
Two-stage system based on fully convolutional neural networks (CNNs) is proposed to improve device robustness.
Our results show that the proposed ASC system attains a state-of-the-art accuracy on the development set.
Neural saliency analysis with class activation mapping gives new insights on the patterns learnt by our models.
arXiv Detail & Related papers (2020-11-03T03:27:18Z) - CURE Dataset: Ladder Networks for Audio Event Classification [15.850545634216484]
There are approximately 3M people with hearing loss who can't perceive events happening around them.
This paper establishes the CURE dataset which contains curated set of specific audio events most relevant for people with hearing loss.
arXiv Detail & Related papers (2020-01-12T09:35:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.