Audio Defect Detection in Music with Deep Networks
- URL: http://arxiv.org/abs/2202.05718v1
- Date: Fri, 11 Feb 2022 15:56:14 GMT
- Title: Audio Defect Detection in Music with Deep Networks
- Authors: Daniel Wolff, R\'emi Mignot and Axel Roebel
- Abstract summary: Deliberate use of artefacts such as clicks in popular music calls for data-centric and context sensitive solutions for detection.
We present a convolutional network architecture following end-to-end encoder decoder configuration to develop detectors for two exemplary audio defects.
- Score: 8.680081568962997
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: With increasing amounts of music being digitally transferred from production
to distribution, automatic means of determining media quality are needed.
Protection mechanisms in digital audio processing tools have not eliminated the
need of production entities located downstream the distribution chain to assess
audio quality and detect defects inserted further upstream. Such analysis often
relies on the received audio and scarce meta-data alone. Deliberate use of
artefacts such as clicks in popular music as well as more recent defects
stemming from corruption in modern audio encodings call for data-centric and
context sensitive solutions for detection. We present a convolutional network
architecture following end-to-end encoder decoder configuration to develop
detectors for two exemplary audio defects. A click detector is trained and
compared to a traditional signal processing method, with a discussion on
context sensitivity. Additional post-processing is used for data augmentation
and workflow simulation. The ability of our models to capture variance is
explored in a detector for artefacts from decompression of corrupted MP3
compressed audio. For both tasks we describe the synthetic generation of
artefacts for controlled detector training and evaluation. We evaluate our
detectors on the large open-source Free Music Archive (FMA) and genre-specific
datasets.
Related papers
- Analyzing the Impact of Splicing Artifacts in Partially Fake Speech Signals [15.595136769477614]
We analyze spliced audio tracks resulting from signal concatenation, investigate their artifacts and assess whether such artifacts introduce any bias in existing datasets.
Our findings reveal that by analyzing splicing artifacts, we can achieve a detection EER of 6.16% and 7.36% on PartialSpoof and HAD datasets, respectively.
arXiv Detail & Related papers (2024-08-25T09:28:04Z) - Training-Free Deepfake Voice Recognition by Leveraging Large-Scale Pre-Trained Models [52.04189118767758]
Generalization is a main issue for current audio deepfake detectors.
In this paper we study the potential of large-scale pre-trained models for audio deepfake detection.
arXiv Detail & Related papers (2024-05-03T15:27:11Z) - Do You Remember? Overcoming Catastrophic Forgetting for Fake Audio
Detection [54.20974251478516]
We propose a continual learning algorithm for fake audio detection to overcome catastrophic forgetting.
When fine-tuning a detection network, our approach adaptively computes the direction of weight modification according to the ratio of genuine utterances and fake utterances.
Our method can easily be generalized to related fields, like speech emotion recognition.
arXiv Detail & Related papers (2023-08-07T05:05:49Z) - Edge Storage Management Recipe with Zero-Shot Data Compression for Road
Anomaly Detection [1.4563998247782686]
We consider an approach for efficient storage management methods while preserving high-fidelity audio.
A computational file compression approach that encodes collected high-resolution audio into a compact code should be recommended.
Motivated by this, we propose a way of simple yet effective pre-trained autoencoder-based data compression method.
arXiv Detail & Related papers (2023-07-10T01:30:21Z) - High Fidelity Neural Audio Compression [92.4812002532009]
We introduce a state-of-the-art real-time, high-fidelity, audio leveraging neural networks.
It consists in a streaming encoder-decoder architecture with quantized latent space trained in an end-to-end fashion.
We simplify and speed-up the training by using a single multiscale spectrogram adversary.
arXiv Detail & Related papers (2022-10-24T17:52:02Z) - A Robust and Explainable Data-Driven Anomaly Detection Approach For
Power Electronics [56.86150790999639]
We present two anomaly detection and classification approaches, namely the Matrix Profile algorithm and anomaly transformer.
The Matrix Profile algorithm is shown to be well suited as a generalizable approach for detecting real-time anomalies in streaming time-series data.
A series of custom filters is created and added to the detector to tune its sensitivity, recall, and detection accuracy.
arXiv Detail & Related papers (2022-09-23T06:09:35Z) - An Initial Investigation for Detecting Vocoder Fingerprints of Fake
Audio [53.134423013599914]
We propose a new problem for detecting vocoder fingerprints of fake audio.
Experiments are conducted on the datasets synthesized by eight state-of-the-art vocoders.
arXiv Detail & Related papers (2022-08-20T09:23:21Z) - Audio-visual Representation Learning for Anomaly Events Detection in
Crowds [119.72951028190586]
This paper attempts to exploit multi-modal learning for modeling the audio and visual signals simultaneously.
We conduct the experiments on SHADE dataset, a synthetic audio-visual dataset in surveillance scenes.
We find introducing audio signals effectively improves the performance of anomaly events detection and outperforms other state-of-the-art methods.
arXiv Detail & Related papers (2021-10-28T02:42:48Z) - Audiovisual transfer learning for audio tagging and sound event
detection [21.574781022415372]
We study the merit of transfer learning for two sound recognition problems, i.e., audio tagging and sound event detection.
We adapt a baseline system utilizing only spectral acoustic inputs to make use of pretrained auditory and visual features.
We perform experiments with these modified models on an audiovisual multi-label data set.
arXiv Detail & Related papers (2021-06-09T21:55:05Z) - Audio Dequantization for High Fidelity Audio Generation in Flow-based
Neural Vocoder [29.63675159839434]
Flow-based neural vocoder has shown significant improvement in real-time speech generation task.
We propose audio dequantization methods in flow-based neural vocoder for high fidelity audio generation.
arXiv Detail & Related papers (2020-08-16T09:37:18Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.