Related papers: WavInWav: Time-domain Speech Hiding via Invertible Neural Network

WavInWav: Time-domain Speech Hiding via Invertible Neural Network

URL: http://arxiv.org/abs/2510.02915v1
Date: Fri, 03 Oct 2025 11:36:16 GMT
Title: WavInWav: Time-domain Speech Hiding via Invertible Neural Network
Authors: Wei Fan, Kejiang Chen, Xiangkun Wang, Weiming Zhang, Nenghai Yu,
Abstract summary: Previous audio hiding methods often result in unsatisfactory quality when recovering secret audio.<n>We use a flow-based invertible neural network to establish a direct link between stego audio, cover audio, and secret audio.<n>We also add an encryption technique to protect the hidden data from unauthorized access.
Score: 78.85443308774484
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Data hiding is essential for secure communication across digital media, and recent advances in Deep Neural Networks (DNNs) provide enhanced methods for embedding secret information effectively. However, previous audio hiding methods often result in unsatisfactory quality when recovering secret audio, due to their inherent limitations in the modeling of time-frequency relationships. In this paper, we explore these limitations and introduce a new DNN-based approach. We use a flow-based invertible neural network to establish a direct link between stego audio, cover audio, and secret audio, enhancing the reversibility of embedding and extracting messages. To address common issues from time-frequency transformations that degrade secret audio quality during recovery, we implement a time-frequency loss on the time-domain signal. This approach not only retains the benefits of time-frequency constraints but also enhances the reversibility of message recovery, which is vital for practical applications. We also add an encryption technique to protect the hidden data from unauthorized access. Experimental results on the VCTK and LibriSpeech datasets demonstrate that our method outperforms previous approaches in terms of subjective and objective metrics and exhibits robustness to various types of noise, suggesting its utility in targeted secure communication scenarios.

Related papers

Stream-Voice-Anon: Enhancing Utility of Real-Time Speaker Anonymization via Neural Audio Codec and Language Models [51.7170633585748]
Stream-Voice-Anon adapts modern causal LM-based NAC architectures specifically for streaming speaker anonymization.<n>Our anonymization approach incorporates pseudo-speaker representation sampling, a speaker embedding mixing and diverse prompt selection strategies.<n>Under the VoicePrivacy 2024 Challenge protocol, Stream-Voice-Anon achieves substantial improvements in intelligibility.
arXiv Detail & Related papers (2026-01-20T13:23:44Z)
Lightweight DNN for Full-Band Speech Denoising on Mobile Devices: Exploiting Long and Short Temporal Patterns [4.121578819979242]
We present a causal, low latency, and lightweight deep neural network (DNN)-based method for full-band speech denoising.<n>The method is based on a modified UNet architecture employing look-back frames, temporal spanning of convolutional kernels, and recurrent neural networks.<n>The proposed method is evaluated using established speech denoising metrics and publicly available datasets.
arXiv Detail & Related papers (2025-09-05T13:18:25Z)
Quantum-Inspired Audio Unlearning: Towards Privacy-Preserving Voice Biometrics [44.60499998155848]
QPAudioEraser is a quantum-inspired audio unlearning framework.<n>It consistently surpasses conventional baselines across single-class, multi-class, sequential, and accent-level erasure scenarios.
arXiv Detail & Related papers (2025-07-29T20:12:24Z)
Shuffling for Semantic Secrecy [12.708217189207828]
We devise a novel semantic security communication system wherein the random shuffling pattern plays the role of the shared secret key.<n>The proposed random shuffling method also exhibits its flexibility in working for the existing semantic communication system as a plugin.
arXiv Detail & Related papers (2025-07-10T03:42:17Z)
Quantized Approximate Signal Processing (QASP): Towards Homomorphic Encryption for audio [1.3584036432145363]
Homomorphic encryption (FHE) offers a promising solution by enabling computations on encrypted data and preserving user privacy.<n>Here, we introduce a fully secure pipeline that computes, with FHE and quantized neural network operations.<n>Our methods also support the private computation of audio descriptors and convolutional neural network (CNN) classifiers.
arXiv Detail & Related papers (2025-05-15T17:01:52Z)
Enhancing Privacy in Semantic Communication over Wiretap Channels leveraging Differential Privacy [51.028047763426265]
Semantic communication (SemCom) improves transmission efficiency by focusing on task-relevant information.<n> transmitting semantic-rich data over insecure channels introduces privacy risks.<n>This paper proposes a novel SemCom framework that integrates differential privacy mechanisms to protect sensitive semantic features.
arXiv Detail & Related papers (2025-04-23T08:42:44Z)
FlowMur: A Stealthy and Practical Audio Backdoor Attack with Limited Knowledge [13.43804949744336]
FlowMur is a stealthy and practical audio backdoor attack that can be launched with limited knowledge. Experiments conducted on two datasets demonstrate that FlowMur achieves high attack performance in both digital and physical settings.
arXiv Detail & Related papers (2023-12-15T10:26:18Z)
NPVForensics: Jointing Non-critical Phonemes and Visemes for Deepfake Detection [50.33525966541906]
Existing multimodal detection methods capture audio-visual inconsistencies to expose Deepfake videos. We propose a novel Deepfake detection method to mine the correlation between Non-critical Phonemes and Visemes, termed NPVForensics. Our model can be easily adapted to the downstream Deepfake datasets with fine-tuning.
arXiv Detail & Related papers (2023-06-12T06:06:05Z)
On Addressing Practical Challenges for RNN-Transduce [72.72132048437751]
We adapt a well-trained RNN-T model to a new domain without collecting the audio data. We obtain word-level confidence scores by utilizing several types of features calculated during decoding. The proposed time stamping method can get less than 50ms word timing difference on average.
arXiv Detail & Related papers (2021-04-27T23:31:43Z)
Noise-Response Analysis of Deep Neural Networks Quantifies Robustness and Fingerprints Structural Malware [48.7072217216104]
Deep neural networks (DNNs) have structural malware' (i.e., compromised weights and activation pathways) It is generally difficult to detect backdoors, and existing detection methods are computationally expensive and require extensive resources (e.g., access to the training data) Here, we propose a rapid feature-generation technique that quantifies the robustness of a DNN, fingerprints' its nonlinearity, and allows us to detect backdoors (if present) Our empirical results demonstrate that we can accurately detect backdoors with high confidence orders-of-magnitude faster than existing approaches (seconds versus
arXiv Detail & Related papers (2020-07-31T23:52:58Z)

This list is automatically generated from the titles and abstracts of the papers in this site.