Music Augmentation and Denoising For Peak-Based Audio Fingerprinting
- URL: http://arxiv.org/abs/2310.13388v2
- Date: Sun, 29 Oct 2023 10:48:51 GMT
- Title: Music Augmentation and Denoising For Peak-Based Audio Fingerprinting
- Authors: Kamil Akesbi, Dorian Desblancs, Benjamin Martin
- Abstract summary: We introduce and release a new audio augmentation pipeline that adds noise to music snippets in a realistic way.
We then propose and release a deep learning model that removes noisy components from spectrograms.
We show that the addition of our model improves the identification performance of commonly used audio fingerprinting systems, even under noisy conditions.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Audio fingerprinting is a well-established solution for song identification
from short recording excerpts. Popular methods rely on the extraction of sparse
representations, generally spectral peaks, and have proven to be accurate,
fast, and scalable to large collections. However, real-world applications of
audio identification often happen in noisy environments, which can cause these
systems to fail. In this work, we tackle this problem by introducing and
releasing a new audio augmentation pipeline that adds noise to music snippets
in a realistic way, by stochastically mimicking real-world scenarios. We then
propose and release a deep learning model that removes noisy components from
spectrograms in order to improve peak-based fingerprinting systems' accuracy.
We show that the addition of our model improves the identification performance
of commonly used audio fingerprinting systems, even under noisy conditions.
Related papers
- Advancing Audio Fingerprinting Accuracy Addressing Background Noise and Distortion Challenges [0.0]
This research proposes an AI and ML integrated audio fingerprinting algorithm to enhance accuracy.
Performance evaluation attests to 100% accuracy within a 5-second audio input.
This research advances audio fingerprinting's adaptability, addressing challenges in varied environments and applications.
arXiv Detail & Related papers (2024-02-21T17:37:30Z) - From Discrete Tokens to High-Fidelity Audio Using Multi-Band Diffusion [84.138804145918]
Deep generative models can generate high-fidelity audio conditioned on various types of representations.
These models are prone to generate audible artifacts when the conditioning is flawed or imperfect.
We propose a high-fidelity multi-band diffusion-based framework that generates any type of audio modality from low-bitrate discrete representations.
arXiv Detail & Related papers (2023-08-02T22:14:29Z) - Audio Denoising for Robust Audio Fingerprinting [0.0]
Music discovery services let users identify songs from short mobile recordings.
These solutions rely more specifically on the extraction of spectral peaks in order to be robust to a number of distortions.
Few works have been done to study the robustness of these algorithms to background noise captured in real environments.
arXiv Detail & Related papers (2022-12-21T09:46:12Z) - SceneFake: An Initial Dataset and Benchmarks for Scene Fake Audio Detection [54.74467470358476]
This paper proposes a dataset for scene fake audio detection named SceneFake.
A manipulated audio is generated by only tampering with the acoustic scene of an original audio.
Some scene fake audio detection benchmark results on the SceneFake dataset are reported in this paper.
arXiv Detail & Related papers (2022-11-11T09:05:50Z) - Play It Back: Iterative Attention for Audio Recognition [104.628661890361]
A key function of auditory cognition is the association of characteristic sounds with their corresponding semantics over time.
We propose an end-to-end attention-based architecture that through selective repetition attends over the most discriminative sounds.
We show that our method can consistently achieve state-of-the-art performance across three audio-classification benchmarks.
arXiv Detail & Related papers (2022-10-20T15:03:22Z) - An Initial Investigation for Detecting Vocoder Fingerprints of Fake
Audio [53.134423013599914]
We propose a new problem for detecting vocoder fingerprints of fake audio.
Experiments are conducted on the datasets synthesized by eight state-of-the-art vocoders.
arXiv Detail & Related papers (2022-08-20T09:23:21Z) - Visual Sound Localization in the Wild by Cross-Modal Interference
Erasing [90.21476231683008]
In real-world scenarios, audios are usually contaminated by off-screen sound and background noise.
We propose the Interference Eraser (IEr) framework, which tackles the problem of audio-visual sound source localization in the wild.
arXiv Detail & Related papers (2022-02-13T21:06:19Z) - Neural Audio Fingerprint for High-specific Audio Retrieval based on
Contrastive Learning [14.60531205031547]
We present a contrastive learning framework that derives from the segment-level search objective.
In the segment-level search task, where the conventional audio fingerprinting systems used to fail, our system using 10x smaller storage has shown promising results.
arXiv Detail & Related papers (2020-10-22T17:44:40Z) - Hierarchical Timbre-Painting and Articulation Generation [92.59388372914265]
We present a fast and high-fidelity method for music generation, based on specified f0 and loudness.
The synthesized audio mimics the timbre and articulation of a target instrument.
arXiv Detail & Related papers (2020-08-30T05:27:39Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.