Capturing scattered discriminative information using a deep architecture
in acoustic scene classification
- URL: http://arxiv.org/abs/2007.04631v1
- Date: Thu, 9 Jul 2020 08:32:06 GMT
- Title: Capturing scattered discriminative information using a deep architecture
in acoustic scene classification
- Authors: Hye-jin Shim, Jee-weon Jung, Ju-ho Kim, Ha-jin Yu
- Abstract summary: In this study, we investigate various methods to capture discriminative information and simultaneously mitigate the overfitting problem.
We adopt a max feature map method to replace conventional non-linear activations in a deep neural network.
Two data augment methods and two deep architecture modules are further explored to reduce overfitting and sustain the system's discriminative power.
- Score: 49.86640645460706
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Frequently misclassified pairs of classes that share many common acoustic
properties exist in acoustic scene classification (ASC). To distinguish such
pairs of classes, trivial details scattered throughout the data could be vital
clues. However, these details are less noticeable and are easily removed using
conventional non-linear activations (e.g. ReLU). Furthermore, making design
choices to emphasize trivial details can easily lead to overfitting if the
system is not sufficiently generalized. In this study, based on the analysis of
the ASC task's characteristics, we investigate various methods to capture
discriminative information and simultaneously mitigate the overfitting problem.
We adopt a max feature map method to replace conventional non-linear
activations in a deep neural network, and therefore, we apply an element-wise
comparison between different filters of a convolution layer's output. Two data
augment methods and two deep architecture modules are further explored to
reduce overfitting and sustain the system's discriminative power. Various
experiments are conducted using the detection and classification of acoustic
scenes and events 2020 task1-a dataset to validate the proposed methods. Our
results show that the proposed system consistently outperforms the baseline,
where the single best performing system has an accuracy of 70.4% compared to
65.1% of the baseline.
Related papers
- Audio Classification of Low Feature Spectrograms Utilizing Convolutional Neural Networks [0.0]
This paper derives several first-of-its-kind machine learning methodologies to analyze low feature audio spectrograms given data distributions.
In particular, this paper proposes several novel customized convolutional architectures to extract identifying features using binary, one-class, and siamese approaches.
arXiv Detail & Related papers (2024-10-28T21:48:57Z) - Few-Shot Specific Emitter Identification via Deep Metric Ensemble
Learning [26.581059299453663]
We propose a novel FS-SEI for aircraft identification via automatic dependent surveillance-broadcast (ADS-B) signals.
Specifically, the proposed method consists of feature embedding and classification.
Simulation results show that if the number of samples per category is more than 5, the average accuracy of our proposed method is higher than 98%.
arXiv Detail & Related papers (2022-07-14T01:09:22Z) - Investigation of Different Calibration Methods for Deep Speaker
Embedding based Verification Systems [66.61691401921296]
This paper presents an investigation over several methods of score calibration for deep speaker embedding extractors.
An additional focus of this research is to estimate the impact of score normalization on the calibration performance of the system.
arXiv Detail & Related papers (2022-03-28T21:22:22Z) - Deep Neural Decision Forest for Acoustic Scene Classification [45.886356124352226]
Acoustic scene classification (ASC) aims to classify an audio clip based on the characteristic of the recording environment.
We propose a novel approach for ASC using deep neural decision forest (DNDF)
arXiv Detail & Related papers (2022-03-07T14:39:42Z) - Learnable Multi-level Frequency Decomposition and Hierarchical Attention
Mechanism for Generalized Face Presentation Attack Detection [7.324459578044212]
Face presentation attack detection (PAD) is attracting a lot of attention and playing a key role in securing face recognition systems.
We propose a dual-stream convolution neural networks (CNNs) framework to deal with unseen scenarios.
We successfully prove the design of our proposed PAD solution in a step-wise ablation study.
arXiv Detail & Related papers (2021-09-16T13:06:43Z) - Anomalous Sound Detection Using a Binary Classification Model and Class
Centroids [47.856367556856554]
We propose a binary classification model that is developed by using not only normal data but also outlier data in the other domains as pseudo-anomalous sound data.
We also investigate the effectiveness of additionally using anomalous sound data for further improving the binary classification model.
arXiv Detail & Related papers (2021-06-11T03:35:06Z) - Self-Attentive Classification-Based Anomaly Detection in Unstructured
Logs [59.04636530383049]
We propose Logsy, a classification-based method to learn log representations.
We show an average improvement of 0.25 in the F1 score, compared to the previous methods.
arXiv Detail & Related papers (2020-08-21T07:26:55Z) - AP-Loss for Accurate One-Stage Object Detection [49.13608882885456]
One-stage object detectors are trained by optimizing classification-loss and localization-loss simultaneously.
The former suffers much from extreme foreground-background imbalance due to the large number of anchors.
This paper proposes a novel framework to replace the classification task in one-stage detectors with a ranking task.
arXiv Detail & Related papers (2020-08-17T13:22:01Z) - Unsupervised Domain Adaptation for Acoustic Scene Classification Using
Band-Wise Statistics Matching [69.24460241328521]
Machine learning algorithms can be negatively affected by mismatches between training (source) and test (target) data distributions.
We propose an unsupervised domain adaptation method that consists of aligning the first- and second-order sample statistics of each frequency band of target-domain acoustic scenes to the ones of the source-domain training dataset.
We show that the proposed method outperforms the state-of-the-art unsupervised methods found in the literature in terms of both source- and target-domain classification accuracy.
arXiv Detail & Related papers (2020-04-30T23:56:05Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.