Evaluating Fake Music Detection Performance Under Audio Augmentations
- URL: http://arxiv.org/abs/2507.10447v1
- Date: Mon, 07 Jul 2025 16:15:02 GMT
- Title: Evaluating Fake Music Detection Performance Under Audio Augmentations
- Authors: Tomasz Sroka, Tomasz Wężowicz, Dominik Sidorczuk, Mateusz Modrzejewski,
- Abstract summary: We construct a dataset consisting of both real and synthetic music generated using several systems.<n>We then apply a range of audio transformations and analyze how they affect classification accuracy.<n>We test the performance of a recent state-of-the-art musical deepfake detection model in the presence of audio augmentations.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: With the rapid advancement of generative audio models, distinguishing between human-composed and generated music is becoming increasingly challenging. As a response, models for detecting fake music have been proposed. In this work, we explore the robustness of such systems under audio augmentations. To evaluate model generalization, we constructed a dataset consisting of both real and synthetic music generated using several systems. We then apply a range of audio transformations and analyze how they affect classification accuracy. We test the performance of a recent state-of-the-art musical deepfake detection model in the presence of audio augmentations. The performance of the model decreases significantly even with the introduction of light augmentations.
Related papers
- Measuring the Robustness of Audio Deepfake Detectors [59.09338266364506]
This work systematically evaluates the robustness of 10 audio deepfake detection models against 16 common corruptions.<n>Using both traditional deep learning models and state-of-the-art foundation models, we make four unique observations.
arXiv Detail & Related papers (2025-03-21T23:21:17Z) - InspireMusic: Integrating Super Resolution and Large Language Model for High-Fidelity Long-Form Music Generation [43.690876909464336]
We introduce InspireMusic, a framework integrated super resolution and large language model for high-fidelity long-form music generation.<n>A unified framework generates high-fidelity music, songs, and audio, which incorporates an autoregressive transformer with a super-resolution flow-matching model.<n>Our model differs from previous approaches, as we utilize an audio tokenizer with one codebook that contains richer semantic information.
arXiv Detail & Related papers (2025-02-28T09:58:25Z) - Meta Audiobox Aesthetics: Unified Automatic Quality Assessment for Speech, Music, and Sound [46.7144966835279]
This paper addresses the need for automated systems capable of predicting audio aesthetics without human intervention.<n>We propose new annotation guidelines that decompose human listening perspectives into four distinct axes.<n>We develop and train no-reference, per-item prediction models that offer a more nuanced assessment of audio quality.
arXiv Detail & Related papers (2025-02-07T18:15:57Z) - Can Large Audio-Language Models Truly Hear? Tackling Hallucinations with Multi-Task Assessment and Stepwise Audio Reasoning [55.2480439325792]
Large audio-language models (LALMs) have shown impressive capabilities in understanding and reasoning about audio and speech information.<n>These models still face challenges, including hallucinating non-existent sound events, misidentifying the order of sound events, and incorrectly attributing sound sources.
arXiv Detail & Related papers (2024-10-21T15:55:27Z) - Diff-A-Riff: Musical Accompaniment Co-creation via Latent Diffusion Models [0.0]
"Diff-A-Riff" is a Latent Diffusion Model designed to generate high-quality instrumentals adaptable to any musical context.
It produces 48kHz pseudo-stereo audio while significantly reducing inference time and memory usage.
arXiv Detail & Related papers (2024-06-12T16:34:26Z) - AVTENet: A Human-Cognition-Inspired Audio-Visual Transformer-Based Ensemble Network for Video Deepfake Detection [49.81915942821647]
This study introduces the audio-visual transformer-based ensemble network (AVTENet) to detect deepfake videos.<n>For evaluation, we use the recently released benchmark multimodal audio-video FakeAVCeleb dataset.<n>For a detailed analysis, we evaluate AVTENet, its variants, and several existing methods on multiple test sets of the FakeAVCeleb dataset.
arXiv Detail & Related papers (2023-10-19T19:01:26Z) - Exploiting Time-Frequency Conformers for Music Audio Enhancement [21.243039524049614]
We propose a music enhancement system based on the Conformer architecture.
Our approach explores the attention mechanisms of the Conformer and examines their performance to discover the best approach for the music enhancement task.
arXiv Detail & Related papers (2023-08-24T06:56:54Z) - MERT: Acoustic Music Understanding Model with Large-Scale Self-supervised Training [74.32603591331718]
We propose an acoustic Music undERstanding model with large-scale self-supervised Training (MERT), which incorporates teacher models to provide pseudo labels in the masked language modelling (MLM) style acoustic pre-training.<n> Experimental results indicate that our model can generalise and perform well on 14 music understanding tasks and attain state-of-the-art (SOTA) overall scores.
arXiv Detail & Related papers (2023-05-31T18:27:43Z) - An Initial Investigation for Detecting Vocoder Fingerprints of Fake
Audio [53.134423013599914]
We propose a new problem for detecting vocoder fingerprints of fake audio.
Experiments are conducted on the datasets synthesized by eight state-of-the-art vocoders.
arXiv Detail & Related papers (2022-08-20T09:23:21Z) - Deep Performer: Score-to-Audio Music Performance Synthesis [30.95307878579825]
Deep Performer is a novel system for score-to-audio music performance synthesis.
Unlike speech, music often contains polyphony and long notes.
We show that our proposed model can synthesize music with clear polyphony and harmonic structures.
arXiv Detail & Related papers (2022-02-12T10:36:52Z) - Audio Impairment Recognition Using a Correlation-Based Feature
Representation [85.08880949780894]
We propose a new representation of hand-crafted features that is based on the correlation of feature pairs.
We show superior performance in terms of compact feature dimensionality and improved computational speed in the test stage.
arXiv Detail & Related papers (2020-03-22T13:34:37Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.