Rethinking CNN Models for Audio Classification
- URL: http://arxiv.org/abs/2007.11154v2
- Date: Fri, 13 Nov 2020 19:09:09 GMT
- Title: Rethinking CNN Models for Audio Classification
- Authors: Kamalesh Palanisamy, Dipika Singhania, Angela Yao
- Abstract summary: ImageNet-Pretrained standard deep CNN models can be used as strong baseline networks for audio classification.
We systematically study how much of pretrained weights is useful for learning spectrograms.
We show that for a given standard model using pretrained weights is better than using randomly Dense weights.
- Score: 20.182928938110923
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In this paper, we show that ImageNet-Pretrained standard deep CNN models can
be used as strong baseline networks for audio classification. Even though there
is a significant difference between audio Spectrogram and standard ImageNet
image samples, transfer learning assumptions still hold firmly. To understand
what enables the ImageNet pretrained models to learn useful audio
representations, we systematically study how much of pretrained weights is
useful for learning spectrograms. We show (1) that for a given standard model
using pretrained weights is better than using randomly initialized weights (2)
qualitative results of what the CNNs learn from the spectrograms by visualizing
the gradients. Besides, we show that even though we use the pretrained model
weights for initialization, there is variance in performance in various output
runs of the same model. This variance in performance is due to the random
initialization of linear classification layer and random mini-batch orderings
in multiple runs. This brings significant diversity to build stronger ensemble
models with an overall improvement in accuracy. An ensemble of ImageNet
pretrained DenseNet achieves 92.89% validation accuracy on the ESC-50 dataset
and 87.42% validation accuracy on the UrbanSound8K dataset which is the current
state-of-the-art on both of these datasets.
Related papers
- Efficient Training with Denoised Neural Weights [65.14892033932895]
This work takes a novel step towards building a weight generator to synthesize the neural weights for initialization.
We use the image-to-image translation task with generative adversarial networks (GANs) as an example due to the ease of collecting model weights.
By initializing the image translation model with the denoised weights predicted by our diffusion model, the training requires only 43.3 seconds.
arXiv Detail & Related papers (2024-07-16T17:59:42Z) - ImageNet-D: Benchmarking Neural Network Robustness on Diffusion Synthetic Object [78.58860252442045]
We introduce generative model as a data source for hard images that benchmark deep models' robustness.
We are able to generate images with more diversified backgrounds, textures, and materials than any prior work, where we term this benchmark as ImageNet-D.
Our work suggests that diffusion models can be an effective source to test vision models.
arXiv Detail & Related papers (2024-03-27T17:23:39Z) - Co-training $2^L$ Submodels for Visual Recognition [67.02999567435626]
Submodel co-training is a regularization method related to co-training, self-distillation and depth.
We show that submodel co-training is effective to train backbones for recognition tasks such as image classification and semantic segmentation.
arXiv Detail & Related papers (2022-12-09T14:38:09Z) - Decoupled Mixup for Generalized Visual Recognition [71.13734761715472]
We propose a novel "Decoupled-Mixup" method to train CNN models for visual recognition.
Our method decouples each image into discriminative and noise-prone regions, and then heterogeneously combines these regions to train CNN models.
Experiment results show the high generalization performance of our method on testing data that are composed of unseen contexts.
arXiv Detail & Related papers (2022-10-26T15:21:39Z) - Corrupted Image Modeling for Self-Supervised Visual Pre-Training [103.99311611776697]
We introduce Corrupted Image Modeling (CIM) for self-supervised visual pre-training.
CIM uses an auxiliary generator with a small trainable BEiT to corrupt the input image instead of using artificial mask tokens.
After pre-training, the enhancer can be used as a high-capacity visual encoder for downstream tasks.
arXiv Detail & Related papers (2022-02-07T17:59:04Z) - Automated Cleanup of the ImageNet Dataset by Model Consensus,
Explainability and Confident Learning [0.0]
ImageNet was the backbone of various convolutional neural networks (CNNs) trained on ILSVRC12Net.
This paper describes automated applications based on model consensus, explainability and confident learning to correct labeling mistakes.
The ImageNet-Clean improves the model performance by 2-2.4 % for SqueezeNet and EfficientNet-B0 models.
arXiv Detail & Related papers (2021-03-30T13:16:35Z) - Shape-Texture Debiased Neural Network Training [50.6178024087048]
Convolutional Neural Networks are often biased towards either texture or shape, depending on the training dataset.
We develop an algorithm for shape-texture debiased learning.
Experiments show that our method successfully improves model performance on several image recognition benchmarks.
arXiv Detail & Related papers (2020-10-12T19:16:12Z) - Increasing the Robustness of Semantic Segmentation Models with
Painting-by-Numbers [39.95214171175713]
We build upon an insight from image classification that output can be improved by increasing the network-bias towards object shapes.
Our basic idea is to alpha-blend a portion of the RGB training images with faked images, where each class-label is given a fixed, randomly chosen color.
We demonstrate the effectiveness of our training schema for DeepLabv3+ with various network backbones, MobileNet-V2, ResNets, and Xception, and evaluate it on the Cityscapes dataset.
arXiv Detail & Related papers (2020-10-12T07:42:39Z) - FU-net: Multi-class Image Segmentation Using Feedback Weighted U-net [5.193724835939252]
We present a generic deep convolutional neural network (DCNN) for multi-class image segmentation.
It is based on a well-established supervised end-to-end DCNN model, known as U-net.
arXiv Detail & Related papers (2020-04-28T13:08:14Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.