Heterogeneous sound classification with the Broad Sound Taxonomy and Dataset
- URL: http://arxiv.org/abs/2410.00980v1
- Date: Tue, 1 Oct 2024 18:09:02 GMT
- Title: Heterogeneous sound classification with the Broad Sound Taxonomy and Dataset
- Authors: Panagiota Anastasopoulou, Jessica Torrey, Xavier Serra, Frederic Font,
- Abstract summary: This paper explores methodologies for automatically classifying heterogeneous sounds characterized by high intra-class variability.
We construct a dataset through manual annotation to ensure accuracy, diverse representation within each class and relevance in real-world scenarios.
Experimental results illustrate that audio embeddings encoding acoustic and semantic information achieve higher accuracy in the classification task.
- Score: 6.91815289914328
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Automatic sound classification has a wide range of applications in machine listening, enabling context-aware sound processing and understanding. This paper explores methodologies for automatically classifying heterogeneous sounds characterized by high intra-class variability. Our study evaluates the classification task using the Broad Sound Taxonomy, a two-level taxonomy comprising 28 classes designed to cover a heterogeneous range of sounds with semantic distinctions tailored for practical user applications. We construct a dataset through manual annotation to ensure accuracy, diverse representation within each class and relevance in real-world scenarios. We compare a variety of both traditional and modern machine learning approaches to establish a baseline for the task of heterogeneous sound classification. We investigate the role of input features, specifically examining how acoustically derived sound representations compare to embeddings extracted with pre-trained deep neural networks that capture both acoustic and semantic information about sounds. Experimental results illustrate that audio embeddings encoding acoustic and semantic information achieve higher accuracy in the classification task. After careful analysis of classification errors, we identify some underlying reasons for failure and propose actions to mitigate them. The paper highlights the need for deeper exploration of all stages of classification, understanding the data and adopting methodologies capable of effectively handling data complexity and generalizing in real-world sound environments.
Related papers
- Probing the Information Encoded in Neural-based Acoustic Models of
Automatic Speech Recognition Systems [7.207019635697126]
This article aims to determine which and where information is located in an automatic speech recognition acoustic model (AM)
Experiments are performed on speaker verification, acoustic environment classification, gender classification, tempo-distortion detection systems and speech sentiment/emotion identification.
Analysis showed that neural-based AMs hold heterogeneous information that seems surprisingly uncorrelated with phoneme recognition.
arXiv Detail & Related papers (2024-02-29T18:43:53Z) - WhaleNet: a Novel Deep Learning Architecture for Marine Mammals Vocalizations on Watkins Marine Mammal Sound Database [49.1574468325115]
We introduce textbfWhaleNet (Wavelet Highly Adaptive Learning Ensemble Network), a sophisticated deep ensemble architecture for the classification of marine mammal vocalizations.
We achieve an improvement in classification accuracy by $8-10%$ over existing architectures, corresponding to a classification accuracy of $97.61%$.
arXiv Detail & Related papers (2024-02-20T11:36:23Z) - Improving the Intent Classification accuracy in Noisy Environment [9.447108578893639]
In this paper, we investigate how environmental noise and related noise reduction techniques to address the intent classification task with end-to-end neural models.
For this task, the use of speech enhancement greatly improves the classification accuracy in noisy conditions.
arXiv Detail & Related papers (2023-03-12T06:11:44Z) - Anomaly Detection using Ensemble Classification and Evidence Theory [62.997667081978825]
We present a novel approach for novel detection using ensemble classification and evidence theory.
A pool selection strategy is presented to build a solid ensemble classifier.
We use uncertainty for the anomaly detection approach.
arXiv Detail & Related papers (2022-12-23T00:50:41Z) - Representation Learning for the Automatic Indexing of Sound Effects
Libraries [79.68916470119743]
We show that a task-specific but dataset-independent representation can successfully address data issues such as class imbalance, inconsistent class labels, and insufficient dataset size.
Detailed experimental results show the impact of metric learning approaches and different cross-dataset training methods on representational effectiveness.
arXiv Detail & Related papers (2022-08-18T23:46:13Z) - Distant finetuning with discourse relations for stance classification [55.131676584455306]
We propose a new method to extract data with silver labels from raw text to finetune a model for stance classification.
We also propose a 3-stage training framework where the noisy level in the data used for finetuning decreases over different stages.
Our approach ranks 1st among 26 competing teams in the stance classification track of the NLPCC 2021 shared task Argumentative Text Understanding for AI Debater.
arXiv Detail & Related papers (2022-04-27T04:24:35Z) - A Comparative Study on Approaches to Acoustic Scene Classification using
CNNs [0.0]
Different kinds of representations have dramatic effects on the accuracy of the classification.
We investigated the spectrograms, MFCCs, and embeddings representations using different CNN networks and autoencoders.
We found that the spectrogram representation has the highest classification accuracy while MFCC has the lowest classification accuracy.
arXiv Detail & Related papers (2022-04-26T09:23:29Z) - Interpreting deep urban sound classification using Layer-wise Relevance
Propagation [5.177947445379688]
This work focuses on the sensitive application of assisting drivers suffering from hearing loss by constructing a deep neural network for urban sound classification.
We use two different representations of audio signals, i.e. Mel and constant-Q spectrograms, while the decisions made by the deep neural network are explained via layer-wise relevance propagation.
Overall, we present an explainable AI framework for understanding deep urban sound classification.
arXiv Detail & Related papers (2021-11-19T14:15:45Z) - Capturing scattered discriminative information using a deep architecture
in acoustic scene classification [49.86640645460706]
In this study, we investigate various methods to capture discriminative information and simultaneously mitigate the overfitting problem.
We adopt a max feature map method to replace conventional non-linear activations in a deep neural network.
Two data augment methods and two deep architecture modules are further explored to reduce overfitting and sustain the system's discriminative power.
arXiv Detail & Related papers (2020-07-09T08:32:06Z) - End-to-End Auditory Object Recognition via Inception Nucleus [7.22898229765707]
We propose a novel end-to-end deep neural network to map the raw waveform inputs to sound class labels.
Our network includes an "inception nucleus" that optimize the size of convolutional filters on the fly.
arXiv Detail & Related papers (2020-05-25T16:08:41Z) - Latent Embedding Feedback and Discriminative Features for Zero-Shot
Classification [139.44681304276]
zero-shot learning aims to classify unseen categories for which no data is available during training.
Generative Adrial Networks synthesize unseen class features by leveraging class-specific semantic embeddings.
We propose to enforce semantic consistency at all stages of zero-shot learning: training, feature synthesis and classification.
arXiv Detail & Related papers (2020-03-17T17:34:16Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.