End-to-End Auditory Object Recognition via Inception Nucleus
- URL: http://arxiv.org/abs/2005.12195v1
- Date: Mon, 25 May 2020 16:08:41 GMT
- Title: End-to-End Auditory Object Recognition via Inception Nucleus
- Authors: Mohammad K. Ebrahimpour, Timothy Shea, Andreea Danielescu, David C.
Noelle, Christopher T. Kello
- Abstract summary: We propose a novel end-to-end deep neural network to map the raw waveform inputs to sound class labels.
Our network includes an "inception nucleus" that optimize the size of convolutional filters on the fly.
- Score: 7.22898229765707
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Machine learning approaches to auditory object recognition are traditionally
based on engineered features such as those derived from the spectrum or
cepstrum. More recently, end-to-end classification systems in image and
auditory recognition systems have been developed to learn features jointly with
classification and result in improved classification accuracy. In this paper,
we propose a novel end-to-end deep neural network to map the raw waveform
inputs to sound class labels. Our network includes an "inception nucleus" that
optimizes the size of convolutional filters on the fly that results in reducing
engineering efforts dramatically. Classification results compared favorably
against current state-of-the-art approaches, besting them by 10.4 percentage
points on the Urbansound8k dataset. Analyses of learned representations
revealed that filters in the earlier hidden layers learned wavelet-like
transforms to extract features that were informative for classification.
Related papers
- Heterogeneous sound classification with the Broad Sound Taxonomy and Dataset [6.91815289914328]
This paper explores methodologies for automatically classifying heterogeneous sounds characterized by high intra-class variability.
We construct a dataset through manual annotation to ensure accuracy, diverse representation within each class and relevance in real-world scenarios.
Experimental results illustrate that audio embeddings encoding acoustic and semantic information achieve higher accuracy in the classification task.
arXiv Detail & Related papers (2024-10-01T18:09:02Z) - Histogram Layer Time Delay Neural Networks for Passive Sonar
Classification [58.720142291102135]
A novel method combines a time delay neural network and histogram layer to incorporate statistical contexts for improved feature learning and underwater acoustic target classification.
The proposed method outperforms the baseline model, demonstrating the utility in incorporating statistical contexts for passive sonar target recognition.
arXiv Detail & Related papers (2023-07-25T19:47:26Z) - Dynamic Perceiver for Efficient Visual Recognition [87.08210214417309]
We propose Dynamic Perceiver (Dyn-Perceiver) to decouple the feature extraction procedure and the early classification task.
A feature branch serves to extract image features, while a classification branch processes a latent code assigned for classification tasks.
Early exits are placed exclusively within the classification branch, thus eliminating the need for linear separability in low-level features.
arXiv Detail & Related papers (2023-06-20T03:00:22Z) - Conditional Variational Capsule Network for Open Set Recognition [64.18600886936557]
In open set recognition, a classifier has to detect unknown classes that are not known at training time.
Recently proposed Capsule Networks have shown to outperform alternatives in many fields, particularly in image recognition.
In our proposal, during training, capsules features of the same known class are encouraged to match a pre-defined gaussian, one for each class.
arXiv Detail & Related papers (2021-04-19T09:39:30Z) - An evidential classifier based on Dempster-Shafer theory and deep
learning [6.230751621285322]
We propose a new classification system based on Dempster-Shafer (DS) theory and a convolutional neural network (CNN) architecture for set-valued classification.
Experiments on image recognition, signal processing, and semantic-relationship classification tasks demonstrate that the proposed combination of deep CNN, DS layer, and expected utility layer makes it possible to improve classification accuracy.
arXiv Detail & Related papers (2021-03-25T01:29:05Z) - Adversarial Feature Augmentation and Normalization for Visual
Recognition [109.6834687220478]
Recent advances in computer vision take advantage of adversarial data augmentation to ameliorate the generalization ability of classification models.
Here, we present an effective and efficient alternative that advocates adversarial augmentation on intermediate feature embeddings.
We validate the proposed approach across diverse visual recognition tasks with representative backbone networks.
arXiv Detail & Related papers (2021-03-22T20:36:34Z) - A Deep Neural Network for Audio Classification with a Classifier
Attention Mechanism [2.3204178451683264]
We introduce a new attention-based neural network architecture called Audio-Based Convolutional Neural Network (CAB-CNN)
The algorithm uses a newly designed architecture consisting of a list of simple classifiers and an attention mechanism as a selector.
Compared to the state-of-the-art algorithms, our algorithm achieves more than 10% improvements on all selected test scores.
arXiv Detail & Related papers (2020-06-14T21:29:44Z) - Ensemble Wrapper Subsampling for Deep Modulation Classification [70.91089216571035]
Subsampling of received wireless signals is important for relaxing hardware requirements as well as the computational cost of signal processing algorithms.
We propose a subsampling technique to facilitate the use of deep learning for automatic modulation classification in wireless communication systems.
arXiv Detail & Related papers (2020-05-10T06:11:13Z) - Decoding Imagined Speech using Wavelet Features and Deep Neural Networks [2.4063592468412267]
This paper proposes a novel approach that uses deep neural networks for classifying imagined speech.
The proposed approach employs only the EEG channels over specific areas of the brain for classification, and derives distinct feature vectors from each of those channels.
The proposed architecture and the approach of treating the data have resulted in an average classification accuracy of 57.15%, which is an improvement of around 35% over the state-of-the-art results.
arXiv Detail & Related papers (2020-03-19T00:36:19Z) - PointAugment: an Auto-Augmentation Framework for Point Cloud
Classification [105.27565020399]
PointAugment is a new auto-augmentation framework that automatically optimize and augments point cloud samples to enrich the data diversity when we train a classification network.
We formulate a learnable point augmentation function with a shape-wise transformation and a point-wise displacement, and carefully design loss functions to adopt the augmented samples.
arXiv Detail & Related papers (2020-02-25T14:25:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.