Advancing Audio Fingerprinting Accuracy Addressing Background Noise and Distortion Challenges
- URL: http://arxiv.org/abs/2402.13957v2
- Date: Sat, 1 Jun 2024 21:37:21 GMT
- Title: Advancing Audio Fingerprinting Accuracy Addressing Background Noise and Distortion Challenges
- Authors: Navin Kamuni, Sathishkumar Chintala, Naveen Kunchakuri, Jyothi Swaroop Arlagadda Narasimharaju, Venkat Kumar,
- Abstract summary: This research proposes an AI and ML integrated audio fingerprinting algorithm to enhance accuracy.
Performance evaluation attests to 100% accuracy within a 5-second audio input.
This research advances audio fingerprinting's adaptability, addressing challenges in varied environments and applications.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Audio fingerprinting, exemplified by pioneers like Shazam, has transformed digital audio recognition. However, existing systems struggle with accuracy in challenging conditions, limiting broad applicability. This research proposes an AI and ML integrated audio fingerprinting algorithm to enhance accuracy. Built on the Dejavu Project's foundations, the study emphasizes real-world scenario simulations with diverse background noises and distortions. Signal processing, central to Dejavu's model, includes the Fast Fourier Transform, spectrograms, and peak extraction. The "constellation" concept and fingerprint hashing enable unique song identification. Performance evaluation attests to 100% accuracy within a 5-second audio input, with a system showcasing predictable matching speed for efficiency. Storage analysis highlights the critical space-speed trade-off for practical implementation. This research advances audio fingerprinting's adaptability, addressing challenges in varied environments and applications.
Related papers
- Proactive Detection of Voice Cloning with Localized Watermarking [50.13539630769929]
We present AudioSeal, the first audio watermarking technique designed specifically for localized detection of AI-generated speech.
AudioSeal employs a generator/detector architecture trained jointly with a localization loss to enable localized watermark detection up to the sample level.
AudioSeal achieves state-of-the-art performance in terms of robustness to real life audio manipulations and imperceptibility based on automatic and human evaluation metrics.
arXiv Detail & Related papers (2024-01-30T18:56:22Z) - Music Augmentation and Denoising For Peak-Based Audio Fingerprinting [0.0]
We introduce and release a new audio augmentation pipeline that adds noise to music snippets in a realistic way.
We then propose and release a deep learning model that removes noisy components from spectrograms.
We show that the addition of our model improves the identification performance of commonly used audio fingerprinting systems, even under noisy conditions.
arXiv Detail & Related papers (2023-10-20T09:56:22Z) - DiffSED: Sound Event Detection with Denoising Diffusion [70.18051526555512]
We reformulate the SED problem by taking a generative learning perspective.
Specifically, we aim to generate sound temporal boundaries from noisy proposals in a denoising diffusion process.
During training, our model learns to reverse the noising process by converting noisy latent queries to the groundtruth versions.
arXiv Detail & Related papers (2023-08-14T17:29:41Z) - EchoVest: Real-Time Sound Classification and Depth Perception Expressed
through Transcutaneous Electrical Nerve Stimulation [0.0]
We have developed a new assistive device, EchoVest, for blind/deaf people to intuitively become more aware of their environment.
EchoVest transmits vibrations to the user's body by utilizing transcutaneous electric nerve stimulation (TENS) based on the source of the sounds.
We aimed to outperform CNN-based machine-learning models, the most commonly used machine learning model for classification tasks, in accuracy and computational costs.
arXiv Detail & Related papers (2023-07-10T14:43:32Z) - Adaptive ship-radiated noise recognition with learnable fine-grained
wavelet transform [25.887932248706218]
This work proposes an adaptive generalized recognition system - AGNet.
By converting fixed wavelet parameters into fine-grained learnable parameters, AGNet learns the characteristics of underwater sound at different frequencies.
Experiments reveal that our AGNet outperforms all baseline methods on several underwater acoustic datasets.
arXiv Detail & Related papers (2023-05-31T06:56:01Z) - Anomalous Sound Detection using Audio Representation with Machine ID
based Contrastive Learning Pretraining [52.191658157204856]
This paper uses contrastive learning to refine audio representations for each machine ID, rather than for each audio sample.
The proposed two-stage method uses contrastive learning to pretrain the audio representation model.
Experiments show that our method outperforms the state-of-the-art methods using contrastive learning or self-supervised classification.
arXiv Detail & Related papers (2023-04-07T11:08:31Z) - High Fidelity Neural Audio Compression [92.4812002532009]
We introduce a state-of-the-art real-time, high-fidelity, audio leveraging neural networks.
It consists in a streaming encoder-decoder architecture with quantized latent space trained in an end-to-end fashion.
We simplify and speed-up the training by using a single multiscale spectrogram adversary.
arXiv Detail & Related papers (2022-10-24T17:52:02Z) - Fully Automated End-to-End Fake Audio Detection [57.78459588263812]
This paper proposes a fully automated end-toend fake audio detection method.
We first use wav2vec pre-trained model to obtain a high-level representation of the speech.
For the network structure, we use a modified version of the differentiable architecture search (DARTS) named light-DARTS.
arXiv Detail & Related papers (2022-08-20T06:46:55Z) - Data Uncertainty Guided Noise-aware Preprocessing Of Fingerprints [5.740220134446289]
We propose a data uncertainty-based framework to quantify noise present in the input image and identify fingerprint regions with background noise and poor ridge clarity.
Quantification of noise helps the model two folds: firstly, it makes the objective function adaptive to the noise in a particular input fingerprint and consequently, helps to achieve robust performance on noisy and distorted fingerprint regions.
The predicted noise variance map enables the end-users to understand erroneous predictions due to noise present in the input image.
arXiv Detail & Related papers (2021-07-02T19:47:58Z) - Temporal-Spatial Neural Filter: Direction Informed End-to-End
Multi-channel Target Speech Separation [66.46123655365113]
Target speech separation refers to extracting the target speaker's speech from mixed signals.
Two main challenges are the complex acoustic environment and the real-time processing requirement.
We propose a temporal-spatial neural filter, which directly estimates the target speech waveform from multi-speaker mixture.
arXiv Detail & Related papers (2020-01-02T11:12:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.