Studying the Effect of Audio Filters in Pre-Trained Models for Environmental Sound Classification
- URL: http://arxiv.org/abs/2408.13644v1
- Date: Sat, 24 Aug 2024 18:13:07 GMT
- Title: Studying the Effect of Audio Filters in Pre-Trained Models for Environmental Sound Classification
- Authors: Aditya Dawn, Wazib Ansar,
- Abstract summary: We have used various CNN models to learn audio features from different audio features like log mel spectrograms, gammatone spectral coefficients, mel spectral coefficients, generated from the audio files, over the past years.
In this paper, we propose a new methodology : Two-Level Classification; the Level 1 will be responsible to classify the audio signal into a broader class and the Level 2s will be responsible to find the actual class to which the audio belongs.
We have also shown the effects of different audio filters, among which a new method of Audio Crop is introduced in this paper, which gave the highest accu
- Score: 0.0
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Environmental Sound Classification is an important problem of sound recognition and is more complicated than speech recognition problems as environmental sounds are not well structured with respect to time and frequency. Researchers have used various CNN models to learn audio features from different audio features like log mel spectrograms, gammatone spectral coefficients, mel-frequency spectral coefficients, generated from the audio files, over the past years. In this paper, we propose a new methodology : Two-Level Classification; the Level 1 Classifier will be responsible to classify the audio signal into a broader class and the Level 2 Classifiers will be responsible to find the actual class to which the audio belongs, based on the output of the Level 1 Classifier. We have also shown the effects of different audio filters, among which a new method of Audio Crop is introduced in this paper, which gave the highest accuracies in most of the cases. We have used the ESC-50 dataset for our experiment and obtained a maximum accuracy of 78.75% in case of Level 1 Classification and 98.04% in case of Level 2 Classifications.
Related papers
- Exploring Meta Information for Audio-based Zero-shot Bird Classification [113.17261694996051]
This study investigates how meta-information can improve zero-shot audio classification.
We use bird species as an example case study due to the availability of rich and diverse meta-data.
arXiv Detail & Related papers (2023-09-15T13:50:16Z) - Transformer-based Sequence Labeling for Audio Classification based on
MFCCs [0.0]
This paper proposes a Transformer-encoder-based model for audio classification using MFCCs.
The model was benchmarked against the ESC-50, Speech Commands v0.02 and UrbanSound8k datasets and has shown strong performance.
The model consisted of a mere 127,544 total parameters, making it light-weight yet highly efficient at the audio classification task.
arXiv Detail & Related papers (2023-04-30T07:25:43Z) - LA-VocE: Low-SNR Audio-visual Speech Enhancement using Neural Vocoders [53.30016986953206]
We propose LA-VocE, a new two-stage approach that predicts mel-spectrograms from noisy audio-visual speech via a transformer-based architecture.
We train and evaluate our framework on thousands of speakers and 11+ different languages, and study our model's ability to adapt to different levels of background noise and speech interference.
arXiv Detail & Related papers (2022-11-20T15:27:55Z) - Improved Zero-Shot Audio Tagging & Classification with Patchout
Spectrogram Transformers [7.817685358710508]
Zero-Shot (ZS) learning overcomes restriction by predicting classes based on adaptable class descriptions.
This study sets out to investigate the effectiveness of self-attention-based audio embedding architectures for ZS learning.
arXiv Detail & Related papers (2022-08-24T09:48:22Z) - Contrastive Environmental Sound Representation Learning [6.85316573653194]
We exploit the self-supervised contrastive technique and a shallow 1D CNN to extract the distinctive audio features (audio representations) without using any explicit annotations.
We generate representations of a given audio using both its raw audio waveform and spectrogram and evaluate if the proposed learner is agnostic to the type of audio input.
arXiv Detail & Related papers (2022-07-18T16:56:30Z) - Low-complexity deep learning frameworks for acoustic scene
classification [64.22762153453175]
We present low-complexity deep learning frameworks for acoustic scene classification (ASC)
The proposed frameworks can be separated into four main steps: Front-end spectrogram extraction, online data augmentation, back-end classification, and late fusion of predicted probabilities.
Our experiments conducted on DCASE 2022 Task 1 Development dataset have fullfiled the requirement of low-complexity and achieved the best classification accuracy of 60.1%.
arXiv Detail & Related papers (2022-06-13T11:41:39Z) - A Comparative Study on Approaches to Acoustic Scene Classification using
CNNs [0.0]
Different kinds of representations have dramatic effects on the accuracy of the classification.
We investigated the spectrograms, MFCCs, and embeddings representations using different CNN networks and autoencoders.
We found that the spectrogram representation has the highest classification accuracy while MFCC has the lowest classification accuracy.
arXiv Detail & Related papers (2022-04-26T09:23:29Z) - Robust Feature Learning on Long-Duration Sounds for Acoustic Scene
Classification [54.57150493905063]
Acoustic scene classification (ASC) aims to identify the type of scene (environment) in which a given audio signal is recorded.
We propose a robust feature learning (RFL) framework to train the CNN.
arXiv Detail & Related papers (2021-08-11T03:33:05Z) - Training Classifiers that are Universally Robust to All Label Noise
Levels [91.13870793906968]
Deep neural networks are prone to overfitting in the presence of label noise.
We propose a distillation-based framework that incorporates a new subcategory of Positive-Unlabeled learning.
Our framework generally outperforms at medium to high noise levels.
arXiv Detail & Related papers (2021-05-27T13:49:31Z) - SoundCLR: Contrastive Learning of Representations For Improved
Environmental Sound Classification [0.6767885381740952]
SoundCLR is a supervised contrastive learning method for effective environment sound classification with state-of-the-art performance.
Due to the comparatively small sizes of the available environmental sound datasets, we propose and exploit a transfer learning and strong data augmentation pipeline.
Our experiments show that our masking based augmentation technique on the log-mel spectrograms can significantly improve the recognition performance.
arXiv Detail & Related papers (2021-03-02T18:42:45Z) - Capturing scattered discriminative information using a deep architecture
in acoustic scene classification [49.86640645460706]
In this study, we investigate various methods to capture discriminative information and simultaneously mitigate the overfitting problem.
We adopt a max feature map method to replace conventional non-linear activations in a deep neural network.
Two data augment methods and two deep architecture modules are further explored to reduce overfitting and sustain the system's discriminative power.
arXiv Detail & Related papers (2020-07-09T08:32:06Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.