Spectral and Rhythm Features for Audio Classification with Deep Convolutional Neural Networks
- URL: http://arxiv.org/abs/2410.06927v2
- Date: Fri, 18 Oct 2024 11:47:40 GMT
- Title: Spectral and Rhythm Features for Audio Classification with Deep Convolutional Neural Networks
- Authors: Friedrich Wolf-Monheim,
- Abstract summary: Convolutional neural networks (CNNs) are widely used in computer vision.
They can be used to represent spectral and rhythm features extracted from digital imagery for the acoustic classification of sounds.
Different spectral and rhythm feature representations like mel-scaled spectrograms, mel-frequency cepstral coefficients (MFCCs) are investigated.
- Score: 0.0
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Convolutional neural networks (CNNs) are widely used in computer vision. They can be used not only for conventional digital image material to recognize patterns, but also for feature extraction from digital imagery representing spectral and rhythm features extracted from time-domain digital audio signals for the acoustic classification of sounds. Different spectral and rhythm feature representations like mel-scaled spectrograms, mel-frequency cepstral coefficients (MFCCs), cyclic tempograms, short-time Fourier transform (STFT) chromagrams, constant-Q transform (CQT) chromagrams and chroma energy normalized statistics (CENS) chromagrams are investigated in terms of the audio classification performance using a deep convolutional neural network. It can be clearly shown that the mel-scaled spectrograms and the mel-frequency cepstral coefficients (MFCCs) perform significantly better than the other spectral and rhythm features investigated in this research for audio classification tasks using deep CNNs. The experiments were carried out with the aid of the ESC-50 dataset with 2,000 labeled environmental audio recordings.
Related papers
- Multi-View Spectrogram Transformer for Respiratory Sound Classification [32.346046623638394]
A Multi-View Spectrogram Transformer (MVST) is proposed to embed different views of time-frequency characteristics into the vision transformer.
Experimental results on the ICBHI dataset demonstrate that the proposed MVST significantly outperforms state-of-the-art methods for classifying respiratory sounds.
arXiv Detail & Related papers (2023-11-16T08:17:02Z) - Modulation Classification Through Deep Learning Using Resolution
Transformed Spectrograms [3.9511559419116224]
We propose a scheme for Automatic Modulation Classification (AMC) using modern architectures of Convolutional Neural Networks (CNN)
We perform resolution transformation of spectrograms that results up to 99.61% of computational load reduction and 8x faster conversion from the received I/Q data.
The performance is evaluated on existing CNN models including SqueezeNet, Resnet-50, InceptionResnet-V2, Inception-V3, VGG-16 and Densenet-201.
arXiv Detail & Related papers (2023-06-06T16:14:15Z) - Time-space-frequency feature Fusion for 3-channel motor imagery
classification [0.0]
This study introduces TSFF-Net, a novel network architecture that integrates time-space-frequency features.
TSFF-Net comprises four main components: time-frequency representation, time-frequency feature extraction, time-space feature extraction, and feature fusion and classification.
Experimental results demonstrate that TSFF-Net not only compensates for the shortcomings of single-mode feature extraction networks in EEG decoding, but also outperforms other state-of-the-art methods.
arXiv Detail & Related papers (2023-04-04T02:01:48Z) - Spectral Cross-Domain Neural Network with Soft-adaptive Threshold
Spectral Enhancement [12.837935554250409]
We propose a novel deep learning model named Spectral Cross-domain neural network (SCDNN)
It simultaneously reveal the key information embedded in spectral and time domains inside the neural network.
The proposed SCDNN is tested with several classification tasks implemented on the public ECG databases textitPTB-XL and textitMIT-BIH.
arXiv Detail & Related papers (2023-01-10T14:23:43Z) - Audio Deepfake Detection Based on a Combination of F0 Information and
Real Plus Imaginary Spectrogram Features [51.924340387119415]
Experimental results on the ASVspoof 2019 LA dataset show that our proposed system is very effective for the audio deepfake detection task.
Our proposed system is very effective for the audio deepfake detection task, achieving an equivalent error rate (EER) of 0.43%, which surpasses almost all systems.
arXiv Detail & Related papers (2022-08-02T02:46:16Z) - Preservation of High Frequency Content for Deep Learning-Based Medical
Image Classification [74.84221280249876]
An efficient analysis of large amounts of chest radiographs can aid physicians and radiologists.
We propose a novel Discrete Wavelet Transform (DWT)-based method for the efficient identification and encoding of visual information.
arXiv Detail & Related papers (2022-05-08T15:29:54Z) - SpecGrad: Diffusion Probabilistic Model based Neural Vocoder with
Adaptive Noise Spectral Shaping [51.698273019061645]
SpecGrad adapts the diffusion noise so that its time-varying spectral envelope becomes close to the conditioning log-mel spectrogram.
It is processed in the time-frequency domain to keep the computational cost almost the same as the conventional DDPM-based neural vocoders.
arXiv Detail & Related papers (2022-03-31T02:08:27Z) - Mask-guided Spectral-wise Transformer for Efficient Hyperspectral Image
Reconstruction [127.20208645280438]
Hyperspectral image (HSI) reconstruction aims to recover the 3D spatial-spectral signal from a 2D measurement.
Modeling the inter-spectra interactions is beneficial for HSI reconstruction.
Mask-guided Spectral-wise Transformer (MST) proposes a novel framework for HSI reconstruction.
arXiv Detail & Related papers (2021-11-15T16:59:48Z) - SpectralFormer: Rethinking Hyperspectral Image Classification with
Transformers [91.09957836250209]
Hyperspectral (HS) images are characterized by approximately contiguous spectral information.
CNNs have been proven to be a powerful feature extractor in HS image classification.
We propose a novel backbone network called ulSpectralFormer for HS image classification.
arXiv Detail & Related papers (2021-07-07T02:59:21Z) - Deep Convolutional and Recurrent Networks for Polyphonic Instrument
Classification from Monophonic Raw Audio Waveforms [30.3491261167433]
Sound Event Detection and Audio Classification tasks are traditionally addressed through time-frequency representations of audio signals such as spectrograms.
Deep neural networks as efficient feature extractors has enabled the direct use of audio signals for classification purposes.
We attempt to recognize musical instruments in polyphonic audio by only feeding their raw waveforms into deep learning models.
arXiv Detail & Related papers (2021-02-13T13:44:46Z) - Cross-Spectral Periocular Recognition with Conditional Adversarial
Networks [59.17685450892182]
We propose Conditional Generative Adversarial Networks, trained to con-vert periocular images between visible and near-infrared spectra.
We obtain a cross-spectral periocular performance of EER=1%, and GAR>99% @ FAR=1%, which is comparable to the state-of-the-art with the PolyU database.
arXiv Detail & Related papers (2020-08-26T15:02:04Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.