Related papers: A Comprehensive Real-World Assessment of Audio Watermarking Algorithms: Will They Survive Neural Codecs?

A Comprehensive Real-World Assessment of Audio Watermarking Algorithms: Will They Survive Neural Codecs?

URL: http://arxiv.org/abs/2505.19663v2
Date: Wed, 28 May 2025 06:20:39 GMT
Title: A Comprehensive Real-World Assessment of Audio Watermarking Algorithms: Will They Survive Neural Codecs?
Authors: Yigitcan Özer, Woosung Choi, Joan Serrà, Mayank Kumar Singh, Wei-Hsiang Liao, Yuki Mitsufuji,
Abstract summary: RAW-Bench is a benchmark for evaluating deep learning-based audio watermarking methods.<n>We introduce a comprehensive audio attack pipeline with various distortions such as compression, background noise, and reverberation.<n>We find that specific distortions, such as polarity inversion, time stretching, or reverb, seriously affect certain methods.
Score: 21.111812193733982
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: We introduce the Robust Audio Watermarking Benchmark (RAW-Bench), a benchmark for evaluating deep learning-based audio watermarking methods with standardized and systematic comparisons. To simulate real-world usage, we introduce a comprehensive audio attack pipeline with various distortions such as compression, background noise, and reverberation, along with a diverse test dataset including speech, environmental sounds, and music recordings. Evaluating four existing watermarking methods on RAW-bench reveals two main insights: (i) neural compression techniques pose the most significant challenge, even when algorithms are trained with such compressions; and (ii) training with audio attacks generally improves robustness, although it is insufficient in some cases. Furthermore, we find that specific distortions, such as polarity inversion, time stretching, or reverb, seriously affect certain methods. The evaluation framework is accessible at github.com/SonyResearch/raw_bench.

Related papers

PhyAVBench: A Challenging Audio Physics-Sensitivity Benchmark for Physically Grounded Text-to-Audio-Video Generation [63.3417467957431]
Text-to-audio-video (T2AV) generation underpins a wide range of applications demanding realistic audio-visual content.<n>We present PhyAVBench, a challenging audio physics-sensitivity benchmark designed to evaluate the audio physics grounding capabilities of existing T2AV models.<n>Unlike prior benchmarks that primarily focus on audio-video synchronization, PhyAVBench explicitly evaluates models' understanding of the physical mechanisms underlying sound generation.
arXiv Detail & Related papers (2025-12-30T05:22:31Z)
AWARE: Audio Watermarking with Adversarial Resistance to Edits [0.0]
AWARE (Audio Watermarking with Adrial Resistance to Edits) is an approach that avoids reliance on attack-versa stacks and handcrafted differentiable distortions.<n> Embedding is obtained via adversarial optimization in the time-frequency domain under a level-proportional budget.<n>AWARE attains high audio quality and speech intelligibility (PESQ/STOI) and consistently low BER across various audio edits.
arXiv Detail & Related papers (2025-10-20T13:10:52Z)
Multi-bit Audio Watermarking [38.40457780873775]
We present Timbru, a post-hoc audio watermarking model that achieves state-of-the-art robustness and imperceptibility trade-offs without training an embedder-detector model.<n>Our approach attains the best average bit error rates, while preserving perceptual quality, demonstrating an efficient, dataset-free path to imperceptible audio watermarking.
arXiv Detail & Related papers (2025-10-02T12:41:01Z)
Pretrained Conformers for Audio Fingerprinting and Retrieval [0.0]
We train conformer-based encoders that are capable of generating unique embeddings for small segments of audio.<n>We achieve state-of-the-art results for audio retrieval tasks while using only 3 seconds of audio to generate embeddings.
arXiv Detail & Related papers (2025-08-15T17:19:09Z)
Measuring the Robustness of Audio Deepfake Detectors [59.09338266364506]
This work systematically evaluates the robustness of 10 audio deepfake detection models against 16 common corruptions.<n>Using both traditional deep learning models and state-of-the-art foundation models, we make four unique observations.
arXiv Detail & Related papers (2025-03-21T23:21:17Z)
Where are we in audio deepfake detection? A systematic analysis over generative and detection models [59.09338266364506]
SONAR is a synthetic AI-Audio Detection Framework and Benchmark.<n>It provides a comprehensive evaluation for distinguishing cutting-edge AI-synthesized auditory content.<n>It is the first framework to uniformly benchmark AI-audio detection across both traditional and foundation model-based detection systems.
arXiv Detail & Related papers (2024-10-06T01:03:42Z)
IDEAW: Robust Neural Audio Watermarking with Invertible Dual-Embedding [29.89341878606415]
In this paper, we design a dual-embedding watermarking model for efficient locating. Experiments show that the proposed model, IDEAW, can withstand various attacks with higher capacity and more efficient locating ability compared to existing methods.
arXiv Detail & Related papers (2024-09-29T09:32:54Z)
An Integrated Algorithm for Robust and Imperceptible Audio Adversarial Examples [2.2866551516539726]
A viable adversarial audio file is produced, then, this is fine-tuned with respect to perceptibility and robustness. We present an integrated algorithm that uses psychoacoustic models and room impulse responses (RIR) in the generation step.
arXiv Detail & Related papers (2023-10-05T06:59:09Z)
AdVerb: Visually Guided Audio Dereverberation [49.958724234969445]
We present AdVerb, a novel audio-visual dereverberation framework. It uses visual cues in addition to the reverberant sound to estimate clean audio.
arXiv Detail & Related papers (2023-08-23T18:20:59Z)
Do You Remember? Overcoming Catastrophic Forgetting for Fake Audio Detection [54.20974251478516]
We propose a continual learning algorithm for fake audio detection to overcome catastrophic forgetting. When fine-tuning a detection network, our approach adaptively computes the direction of weight modification according to the ratio of genuine utterances and fake utterances. Our method can easily be generalized to related fields, like speech emotion recognition.
arXiv Detail & Related papers (2023-08-07T05:05:49Z)
Self-Supervised Visual Acoustic Matching [63.492168778869726]
Acoustic matching aims to re-synthesize an audio clip to sound as if it were recorded in a target acoustic environment. We propose a self-supervised approach to visual acoustic matching where training samples include only the target scene image and audio. Our approach jointly learns to disentangle room acoustics and re-synthesize audio into the target environment, via a conditional GAN framework and a novel metric.
arXiv Detail & Related papers (2023-07-27T17:59:59Z)
Anomalous Sound Detection using Audio Representation with Machine ID based Contrastive Learning Pretraining [52.191658157204856]
This paper uses contrastive learning to refine audio representations for each machine ID, rather than for each audio sample. The proposed two-stage method uses contrastive learning to pretrain the audio representation model. Experiments show that our method outperforms the state-of-the-art methods using contrastive learning or self-supervised classification.
arXiv Detail & Related papers (2023-04-07T11:08:31Z)
Defending against Adversarial Audio via Diffusion Model [18.792523775685456]
adversarial audio examples can cause abnormal behaviors for acoustic systems. Deep learning models have been widely used in commercial acoustic systems in recent years. We propose an adversarial purification-based defense pipeline, AudioPure, for acoustic systems via off-the-shelf diffusion models.
arXiv Detail & Related papers (2023-03-02T07:15:47Z)
High Fidelity Neural Audio Compression [92.4812002532009]
We introduce a state-of-the-art real-time, high-fidelity, audio leveraging neural networks. It consists in a streaming encoder-decoder architecture with quantized latent space trained in an end-to-end fashion. We simplify and speed-up the training by using a single multiscale spectrogram adversary.
arXiv Detail & Related papers (2022-10-24T17:52:02Z)
Robust Time Series Denoising with Learnable Wavelet Packet Transform [1.370633147306388]
In many applications, signal denoising is often the first pre-processing step before any subsequent analysis or learning task. We propose to apply a deep learning denoising model inspired by a signal processing, a learnable version of wavelet packet transform. We demonstrate how the proposed algorithm relates to the universality of signal processing methods and the learning capabilities of deep learning approaches.
arXiv Detail & Related papers (2022-06-13T13:05:58Z)
VocBench: A Neural Vocoder Benchmark for Speech Synthesis [36.94062576597112]
We present VocBench, a framework that benchmark the performance of state-of-the art neural vocoders. VocBench uses a systematic study to evaluate different neural vocoders in a shared environment that enables a fair comparison between them. Our results demonstrate that the framework is capable of showing the competitive efficacy and the quality of the synthesized samples for each vocoder.
arXiv Detail & Related papers (2021-12-06T15:09:57Z)
Self-Guided Quantum State Learning for Mixed States [7.270980742378388]
The salient features of our algorithm are efficient $O left( d3 right)$ post-processing in the infidelity dimension $d$ of the state. A higher resilience against measurement noise makes our algorithm suitable for noisy intermediate-scale quantum applications.
arXiv Detail & Related papers (2021-06-11T04:40:26Z)
Dynamic Layer Customization for Noise Robust Speech Emotion Recognition in Heterogeneous Condition Training [16.807298318504156]
We show that we can improve performance by dynamically routing samples to specialized feature encoders for each noise condition. We extend these improvements to the multimodal setting by dynamically routing samples to maintain temporal ordering.
arXiv Detail & Related papers (2020-10-21T18:07:32Z)
A black-box adversarial attack for poisoning clustering [78.19784577498031]
We propose a black-box adversarial attack for crafting adversarial samples to test the robustness of clustering algorithms. We show that our attacks are transferable even against supervised algorithms such as SVMs, random forests, and neural networks.
arXiv Detail & Related papers (2020-09-09T18:19:31Z)

This list is automatically generated from the titles and abstracts of the papers in this site.