Audio Denoising for Robust Audio Fingerprinting
- URL: http://arxiv.org/abs/2212.11277v1
- Date: Wed, 21 Dec 2022 09:46:12 GMT
- Title: Audio Denoising for Robust Audio Fingerprinting
- Authors: Kamil Akesbi
- Abstract summary: Music discovery services let users identify songs from short mobile recordings.
These solutions rely more specifically on the extraction of spectral peaks in order to be robust to a number of distortions.
Few works have been done to study the robustness of these algorithms to background noise captured in real environments.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Music discovery services let users identify songs from short mobile
recordings. These solutions are often based on Audio Fingerprinting, and rely
more specifically on the extraction of spectral peaks in order to be robust to
a number of distortions. Few works have been done to study the robustness of
these algorithms to background noise captured in real environments. In
particular, AFP systems still struggle when the signal to noise ratio is low,
i.e when the background noise is strong. In this project, we tackle this
problematic with Deep Learning. We test a new hybrid strategy which consists of
inserting a denoising DL model in front of a peak-based AFP algorithm. We
simulate noisy music recordings using a realistic data augmentation pipeline,
and train a DL model to denoise them. The denoising model limits the impact of
background noise on the AFP system's extracted peaks, improving its robustness
to noise. We further propose a novel loss function to adapt the DL model to the
considered AFP system, increasing its precision in terms of retrieved spectral
peaks. To the best of our knowledge, this hybrid strategy has not been tested
before.
Related papers
- CheapNET: Improving Light-weight speech enhancement network by projected
loss function [0.8192907805418583]
We introduce a novel projection loss function, diverging from MSE, to enhance noise suppression.
For echo cancellation, the function enables direct predictions on LAEC pre-processed outputs.
Our noise suppression model achieves near state-of-the-art results with only 3.1M parameters and 0.4GFlops/s computational load.
arXiv Detail & Related papers (2023-11-27T16:03:42Z) - Music Augmentation and Denoising For Peak-Based Audio Fingerprinting [0.0]
We introduce and release a new audio augmentation pipeline that adds noise to music snippets in a realistic way.
We then propose and release a deep learning model that removes noisy components from spectrograms.
We show that the addition of our model improves the identification performance of commonly used audio fingerprinting systems, even under noisy conditions.
arXiv Detail & Related papers (2023-10-20T09:56:22Z) - Physics-guided Noise Neural Proxy for Practical Low-light Raw Image
Denoising [22.11250276261829]
Recently, the mainstream practice for training low-light raw image denoising has shifted towards employing synthetic data.
Noise modeling, which focuses on characterizing the noise distribution of real-world sensors, profoundly influences the effectiveness and practicality of synthetic data.
We propose a novel strategy: learning the noise model from dark frames instead of paired real data, to break down the data dependency.
arXiv Detail & Related papers (2023-10-13T14:14:43Z) - DiffSED: Sound Event Detection with Denoising Diffusion [70.18051526555512]
We reformulate the SED problem by taking a generative learning perspective.
Specifically, we aim to generate sound temporal boundaries from noisy proposals in a denoising diffusion process.
During training, our model learns to reverse the noising process by converting noisy latent queries to the groundtruth versions.
arXiv Detail & Related papers (2023-08-14T17:29:41Z) - Adaptive Fake Audio Detection with Low-Rank Model Squeezing [50.7916414913962]
Traditional approaches, such as finetuning, are computationally intensive and pose a risk of impairing the acquired knowledge of known fake audio types.
We introduce the concept of training low-rank adaptation matrices tailored specifically to the newly emerging fake audio types.
Our approach offers several advantages, including reduced storage memory requirements and lower equal error rates.
arXiv Detail & Related papers (2023-06-08T06:06:42Z) - Improving the Robustness of Summarization Models by Detecting and
Removing Input Noise [50.27105057899601]
We present a large empirical study quantifying the sometimes severe loss in performance from different types of input noise for a range of datasets and model sizes.
We propose a light-weight method for detecting and removing such noise in the input during model inference without requiring any training, auxiliary models, or even prior knowledge of the type of noise.
arXiv Detail & Related papers (2022-12-20T00:33:11Z) - Removing Noise from Extracellular Neural Recordings Using Fully
Convolutional Denoising Autoencoders [62.997667081978825]
We propose a Fully Convolutional Denoising Autoencoder, which learns to produce a clean neuronal activity signal from a noisy multichannel input.
The experimental results on simulated data show that our proposed method can improve significantly the quality of noise-corrupted neural signals.
arXiv Detail & Related papers (2021-09-18T14:51:24Z) - Denoising Distantly Supervised Named Entity Recognition via a
Hypergeometric Probabilistic Model [26.76830553508229]
Hypergeometric Learning (HGL) is a denoising algorithm for distantly supervised named entity recognition.
HGL takes both noise distribution and instance-level confidence into consideration.
Experiments show that HGL can effectively denoise the weakly-labeled data retrieved from distant supervision.
arXiv Detail & Related papers (2021-06-17T04:01:25Z) - Adaptive noise imitation for image denoising [58.21456707617451]
We develop a new textbfadaptive noise imitation (ADANI) algorithm that can synthesize noisy data from naturally noisy images.
To produce realistic noise, a noise generator takes unpaired noisy/clean images as input, where the noisy image is a guide for noise generation.
Coupling the noisy data output from ADANI with the corresponding ground-truth, a denoising CNN is then trained in a fully-supervised manner.
arXiv Detail & Related papers (2020-11-30T02:49:36Z) - Neural Audio Fingerprint for High-specific Audio Retrieval based on
Contrastive Learning [14.60531205031547]
We present a contrastive learning framework that derives from the segment-level search objective.
In the segment-level search task, where the conventional audio fingerprinting systems used to fail, our system using 10x smaller storage has shown promising results.
arXiv Detail & Related papers (2020-10-22T17:44:40Z) - Hierarchical Timbre-Painting and Articulation Generation [92.59388372914265]
We present a fast and high-fidelity method for music generation, based on specified f0 and loudness.
The synthesized audio mimics the timbre and articulation of a target instrument.
arXiv Detail & Related papers (2020-08-30T05:27:39Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.