Related papers: Scalable Evaluation for Audio Identification via Synthetic Latent Fingerprint Generation

Scalable Evaluation for Audio Identification via Synthetic Latent Fingerprint Generation

URL: http://arxiv.org/abs/2509.18620v1
Date: Tue, 23 Sep 2025 04:11:15 GMT
Title: Scalable Evaluation for Audio Identification via Synthetic Latent Fingerprint Generation
Authors: Aditya Bhattacharjee, Marco Pasini, Emmanouil Benetos,
Abstract summary: The evaluation of audio fingerprinting at a realistic scale is limited by the scarcity of large public music databases.<n>We present an audio-free approach that synthesises latent fingerprints which approximate the distribution of real fingerprints.<n>The synthetic fingerprints generated using our system act as realistic distractors and enable the simulation of retrieval performance at a large scale without requiring additional audio.
Score: 17.07118976088468
License: http://creativecommons.org/licenses/by/4.0/
Abstract: The evaluation of audio fingerprinting at a realistic scale is limited by the scarcity of large public music databases. We present an audio-free approach that synthesises latent fingerprints which approximate the distribution of real fingerprints. Our method trains a Rectified Flow model on embeddings extracted by pre-trained neural audio fingerprinting systems. The synthetic fingerprints generated using our system act as realistic distractors and enable the simulation of retrieval performance at a large scale without requiring additional audio. We assess the fidelity of synthetic fingerprints by comparing the distributions to real data. We further benchmark the retrieval performances across multiple state-of-the-art audio fingerprinting frameworks by augmenting real reference databases with synthetic distractors, and show that the scaling trends obtained with synthetic distractors closely track those obtained with real distractors. Finally, we scale the synthetic distractor database to model retrieval performance for very large databases, providing a practical metric of system scalability that does not depend on access to audio corpora.

Related papers

Measuring the Robustness of Audio Deepfake Detectors [59.09338266364506]
This work systematically evaluates the robustness of 10 audio deepfake detection models against 16 common corruptions.<n>Using both traditional deep learning models and state-of-the-art foundation models, we make four unique observations.
arXiv Detail & Related papers (2025-03-21T23:21:17Z)
Where are we in audio deepfake detection? A systematic analysis over generative and detection models [59.09338266364506]
SONAR is a synthetic AI-Audio Detection Framework and Benchmark.<n>It provides a comprehensive evaluation for distinguishing cutting-edge AI-synthesized auditory content.<n>It is the first framework to uniformly benchmark AI-audio detection across both traditional and foundation model-based detection systems.
arXiv Detail & Related papers (2024-10-06T01:03:42Z)
Training-Free Deepfake Voice Recognition by Leveraging Large-Scale Pre-Trained Models [52.04189118767758]
Generalization is a main issue for current audio deepfake detectors. In this paper we study the potential of large-scale pre-trained models for audio deepfake detection.
arXiv Detail & Related papers (2024-05-03T15:27:11Z)
Real Acoustic Fields: An Audio-Visual Room Acoustics Dataset and Benchmark [65.79402756995084]
Real Acoustic Fields (RAF) is a new dataset that captures real acoustic room data from multiple modalities. RAF is the first dataset to provide densely captured room acoustic data.
arXiv Detail & Related papers (2024-03-27T17:59:56Z)
Advancing Audio Fingerprinting Accuracy Addressing Background Noise and Distortion Challenges [0.0]
This research proposes an AI and ML integrated audio fingerprinting algorithm to enhance accuracy. Performance evaluation attests to 100% accuracy within a 5-second audio input. This research advances audio fingerprinting's adaptability, addressing challenges in varied environments and applications.
arXiv Detail & Related papers (2024-02-21T17:37:30Z)
An investigation of the reconstruction capacity of stacked convolutional autoencoders for log-mel-spectrograms [2.3204178451683264]
In audio processing applications, the generation of expressive sounds based on high-level representations demonstrates a high demand. Modern algorithms, such as neural networks, have inspired the development of expressive synthesizers based on musical instrument compression. This study investigates the use of stacked convolutional autoencoders for the compression of time-frequency audio representations for a variety of instruments for a single pitch.
arXiv Detail & Related papers (2023-01-18T17:19:04Z)
Audio Deepfake Attribution: An Initial Dataset and Investigation [41.62487394875349]
We design the first deepfake audio dataset for the attribution of audio generation tools, called Audio Deepfake Attribution (ADA) We propose the Class- Multi-Center Learning ( CRML) method for open-set audio deepfake attribution (OSADA) Experimental results demonstrate that the CRML method effectively addresses open-set risks in real-world scenarios.
arXiv Detail & Related papers (2022-08-21T05:15:40Z)
SpoofGAN: Synthetic Fingerprint Spoof Images [47.87570819350573]
A major limitation to advances in fingerprint spoof detection is the lack of publicly available, large-scale fingerprint spoof datasets. This work aims to demonstrate the utility of synthetic (both live and spoof) fingerprints in supplying these algorithms with sufficient data.
arXiv Detail & Related papers (2022-04-13T16:27:27Z)
Deep Neural Convolutive Matrix Factorization for Articulatory Representation Decomposition [48.56414496900755]
This work uses a neural implementation of convolutive sparse matrix factorization to decompose the articulatory data into interpretable gestures and gestural scores. Phoneme recognition experiments were additionally performed to show that gestural scores indeed code phonological information successfully.
arXiv Detail & Related papers (2022-04-01T14:25:19Z)
Synthesis and Reconstruction of Fingerprints using Generative Adversarial Networks [6.700873164609009]
We propose a novel fingerprint synthesis and reconstruction framework based on the StyleGan2 architecture. We also derive a computational approach to modify the attributes of the generated fingerprint while preserving their identity. The proposed framework was experimentally shown to outperform contemporary state-of-the-art approaches for both fingerprint synthesis and reconstruction.
arXiv Detail & Related papers (2022-01-17T00:18:00Z)
Responsible Disclosure of Generative Models Using Scalable Fingerprinting [70.81987741132451]
Deep generative models have achieved a qualitatively new level of performance. There are concerns on how this technology can be misused to spoof sensors, generate deep fakes, and enable misinformation at scale. Our work enables a responsible disclosure of such state-of-the-art generative models, that allows researchers and companies to fingerprint their models.
arXiv Detail & Related papers (2020-12-16T03:51:54Z)
SynFi: Automatic Synthetic Fingerprint Generation [23.334625222079634]
We introduce a new approach to automatically generate high-fidelity synthetic fingerprints at scale. We show that our methodology is the first to generate fingerprints that are computationally indistinguishable from real ones.
arXiv Detail & Related papers (2020-02-16T07:45:29Z)

This list is automatically generated from the titles and abstracts of the papers in this site.