Audio Impairment Recognition Using a Correlation-Based Feature
Representation
- URL: http://arxiv.org/abs/2003.09889v2
- Date: Tue, 24 Mar 2020 14:54:21 GMT
- Title: Audio Impairment Recognition Using a Correlation-Based Feature
Representation
- Authors: Alessandro Ragano, Emmanouil Benetos, Andrew Hines
- Abstract summary: We propose a new representation of hand-crafted features that is based on the correlation of feature pairs.
We show superior performance in terms of compact feature dimensionality and improved computational speed in the test stage.
- Score: 85.08880949780894
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Audio impairment recognition is based on finding noise in audio files and
categorising the impairment type. Recently, significant performance improvement
has been obtained thanks to the usage of advanced deep learning models.
However, feature robustness is still an unresolved issue and it is one of the
main reasons why we need powerful deep learning architectures. In the presence
of a variety of musical styles, hand-crafted features are less efficient in
capturing audio degradation characteristics and they are prone to failure when
recognising audio impairments and could mistakenly learn musical concepts
rather than impairment types. In this paper, we propose a new representation of
hand-crafted features that is based on the correlation of feature pairs. We
experimentally compare the proposed correlation-based feature representation
with a typical raw feature representation used in machine learning and we show
superior performance in terms of compact feature dimensionality and improved
computational speed in the test stage whilst achieving comparable accuracy.
Related papers
- Robust Network Learning via Inverse Scale Variational Sparsification [55.64935887249435]
We introduce an inverse scale variational sparsification framework within a time-continuous inverse scale space formulation.
Unlike frequency-based methods, our approach not only removes noise by smoothing small-scale features.
We show the efficacy of our approach through enhanced robustness against various noise types.
arXiv Detail & Related papers (2024-09-27T03:17:35Z) - Resource-constrained stereo singing voice cancellation [1.0962868591006976]
We study the problem of stereo singing voice cancellation.
Our approach is evaluated using objective offline metrics and a large-scale MUSHRA trial.
arXiv Detail & Related papers (2024-01-22T16:05:30Z) - Knowledge Diffusion for Distillation [53.908314960324915]
The representation gap between teacher and student is an emerging topic in knowledge distillation (KD)
We state that the essence of these methods is to discard the noisy information and distill the valuable information in the feature.
We propose a novel KD method dubbed DiffKD, to explicitly denoise and match features using diffusion models.
arXiv Detail & Related papers (2023-05-25T04:49:34Z) - Anomalous Sound Detection using Audio Representation with Machine ID
based Contrastive Learning Pretraining [52.191658157204856]
This paper uses contrastive learning to refine audio representations for each machine ID, rather than for each audio sample.
The proposed two-stage method uses contrastive learning to pretrain the audio representation model.
Experiments show that our method outperforms the state-of-the-art methods using contrastive learning or self-supervised classification.
arXiv Detail & Related papers (2023-04-07T11:08:31Z) - Self-attention fusion for audiovisual emotion recognition with
incomplete data [103.70855797025689]
We consider the problem of multimodal data analysis with a use case of audiovisual emotion recognition.
We propose an architecture capable of learning from raw data and describe three variants of it with distinct modality fusion mechanisms.
arXiv Detail & Related papers (2022-01-26T18:04:29Z) - Audiovisual transfer learning for audio tagging and sound event
detection [21.574781022415372]
We study the merit of transfer learning for two sound recognition problems, i.e., audio tagging and sound event detection.
We adapt a baseline system utilizing only spectral acoustic inputs to make use of pretrained auditory and visual features.
We perform experiments with these modified models on an audiovisual multi-label data set.
arXiv Detail & Related papers (2021-06-09T21:55:05Z) - Robust Audio-Visual Instance Discrimination [79.74625434659443]
We present a self-supervised learning method to learn audio and video representations.
We address the problems of audio-visual instance discrimination and improve transfer learning performance.
arXiv Detail & Related papers (2021-03-29T19:52:29Z) - Structure-Aware Audio-to-Score Alignment using Progressively Dilated
Convolutional Neural Networks [8.669338893753885]
The identification of structural differences between a music performance and the score is a challenging yet integral step of audio-to-score alignment.
We present a novel method to detect such differences using progressively dilated convolutional neural networks.
arXiv Detail & Related papers (2021-01-31T05:14:58Z) - COALA: Co-Aligned Autoencoders for Learning Semantically Enriched Audio
Representations [32.456824945999465]
We propose a method for learning audio representations, aligning the learned latent representations of audio and associated tags.
We evaluate the quality of our embedding model, measuring its performance as a feature extractor on three different tasks.
arXiv Detail & Related papers (2020-06-15T13:17:18Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.