IO-RAE: Information-Obfuscation Reversible Adversarial Example for Audio Privacy Protection
- URL: http://arxiv.org/abs/2601.01239v1
- Date: Sat, 03 Jan 2026 17:08:35 GMT
- Title: IO-RAE: Information-Obfuscation Reversible Adversarial Example for Audio Privacy Protection
- Authors: Jiajie Zhu, Xia Du, Xiaoyuan Liu, Jizhe Zhou, Qizhen Xu, Zheng Lin, Chi-Man Pun,
- Abstract summary: This paper introduces an Information-Obfuscation Reversible Adrial Example (IO-RAE) framework to safeguard audio privacy.<n>IO-RAE leverages large language models to generate misleading yet contextually coherent content.<n>We propose the Cumulative Signal Attack technique, which mitigates high-frequency noise and enhances attack efficacy by targeting low-frequency signals.
- Score: 38.60913794380576
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The rapid advancements in artificial intelligence have significantly accelerated the adoption of speech recognition technology, leading to its widespread integration across various applications. However, this surge in usage also highlights a critical issue: audio data is highly vulnerable to unauthorized exposure and analysis, posing significant privacy risks for businesses and individuals. This paper introduces an Information-Obfuscation Reversible Adversarial Example (IO-RAE) framework, the pioneering method designed to safeguard audio privacy using reversible adversarial examples. IO-RAE leverages large language models to generate misleading yet contextually coherent content, effectively preventing unauthorized eavesdropping by humans and Automatic Speech Recognition (ASR) systems. Additionally, we propose the Cumulative Signal Attack technique, which mitigates high-frequency noise and enhances attack efficacy by targeting low-frequency signals. Our approach ensures the protection of audio data without degrading its quality or our ability. Experimental evaluations demonstrate the superiority of our method, achieving a targeted misguidance rate of 96.5% and a remarkable 100% untargeted misguidance rate in obfuscating target keywords across multiple ASR models, including a commercial black-box system from Google. Furthermore, the quality of the recovered audio, measured by the Perceptual Evaluation of Speech Quality score, reached 4.45, comparable to high-quality original recordings. Notably, the recovered audio processed by ASR systems exhibited an error rate of 0%, indicating nearly lossless recovery. These results highlight the practical applicability and effectiveness of our IO-RAE framework in protecting sensitive audio privacy.
Related papers
- When Denoising Hinders: Revisiting Zero-Shot ASR with SAM-Audio and Whisper [0.0]
We present a systematic empirical study on the impact of Segment Anything Model Audio by Meta AI, when used as a preprocessing step for zero-shot transcription with Whisper.<n> Contrary to common intuition, our results show that SAM-Audio preprocessing consistently degrades ASR performance.<n>These findings expose a fundamental mismatch: audio that is perceptually cleaner to human listeners is not necessarily robust for machine recognition.
arXiv Detail & Related papers (2026-03-05T01:20:11Z) - VoxGuard: Evaluating User and Attribute Privacy in Speech via Membership Inference Attacks [51.68795949691009]
We introduce VoxGuard, a framework grounded in differential privacy and membership inference.<n>For attributes, we show that simple transparent attacks recover gender and accent with near-perfect accuracy even after anonymization.<n>Our results demonstrate that EER substantially underestimates leakage, highlighting the need for low-FPR evaluation.
arXiv Detail & Related papers (2025-09-22T20:57:48Z) - Quantum-Inspired Audio Unlearning: Towards Privacy-Preserving Voice Biometrics [44.60499998155848]
QPAudioEraser is a quantum-inspired audio unlearning framework.<n>It consistently surpasses conventional baselines across single-class, multi-class, sequential, and accent-level erasure scenarios.
arXiv Detail & Related papers (2025-07-29T20:12:24Z) - Whispering Under the Eaves: Protecting User Privacy Against Commercial and LLM-powered Automatic Speech Recognition Systems [20.45938874279563]
We propose a novel framework, AudioShield, to protect speech against automatic speech recognition systems.<n>By transferring the perturbations to the latent space, the audio quality is preserved to a large extent.<n> AudioShield shows high effectiveness in real-time end-to-end scenarios, and demonstrates strong resilience against adaptive countermeasures.
arXiv Detail & Related papers (2025-04-01T14:49:39Z) - $C^2$AV-TSE: Context and Confidence-aware Audio Visual Target Speaker Extraction [80.57232374640911]
We propose a model-agnostic strategy called the Mask-And-Recover (MAR)<n>MAR integrates both inter- and intra-modality contextual correlations to enable global inference within extraction modules.<n>To better target challenging parts within each sample, we introduce a Fine-grained Confidence Score (FCS) model.
arXiv Detail & Related papers (2025-04-01T13:01:30Z) - Boosting the Transferability of Audio Adversarial Examples with Acoustic Representation Optimization [4.720552406377147]
We propose a technique that aligns adversarial perturbations with low-level acoustic characteristics derived from speech representation models.<n>Our method is plug-and-play and can be integrated with any existing attack methods.
arXiv Detail & Related papers (2025-03-25T12:14:10Z) - Mitigating Unauthorized Speech Synthesis for Voice Protection [7.1578783467799]
malicious voice exploitation has brought huge hazards in our daily lives.
It is crucial to protect publicly accessible speech data that contains sensitive information, such as personal voiceprints.
We devise Pivotal Objective Perturbation (POP) that applies imperceptible error-minimizing noises on original speech samples.
arXiv Detail & Related papers (2024-10-28T05:16:37Z) - Blackbox Untargeted Adversarial Testing of Automatic Speech Recognition
Systems [1.599072005190786]
Speech recognition systems are prevalent in applications for voice navigation and voice control of domestic appliances.
Deep neural networks (DNNs) have been shown to be susceptible to adversarial perturbations.
To help test the correctness of ASRS, we propose techniques that automatically generate blackbox.
arXiv Detail & Related papers (2021-12-03T10:21:47Z) - Towards Robust Speech-to-Text Adversarial Attack [78.5097679815944]
This paper introduces a novel adversarial algorithm for attacking the state-of-the-art speech-to-text systems, namely DeepSpeech, Kaldi, and Lingvo.
Our approach is based on developing an extension for the conventional distortion condition of the adversarial optimization formulation.
Minimizing over this metric, which measures the discrepancies between original and adversarial samples' distributions, contributes to crafting signals very close to the subspace of legitimate speech recordings.
arXiv Detail & Related papers (2021-03-15T01:51:41Z) - Characterizing Speech Adversarial Examples Using Self-Attention U-Net
Enhancement [102.48582597586233]
We present a U-Net based attention model, U-Net$_At$, to enhance adversarial speech signals.
We conduct experiments on the automatic speech recognition (ASR) task with adversarial audio attacks.
arXiv Detail & Related papers (2020-03-31T02:16:34Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.