An Improved Phase Coding Audio Steganography Algorithm
- URL: http://arxiv.org/abs/2408.13277v2
- Date: Tue, 27 Aug 2024 06:58:34 GMT
- Title: An Improved Phase Coding Audio Steganography Algorithm
- Authors: Guang Yang,
- Abstract summary: AI technology has made voice cloning increasingly accessible, leading to a rise in fraud involving AI-generated audio forgeries.
This study presents an improved Phase Coding audio steganography algorithm that segments the audio signal dynamically, embedding data into the mid-frequency phase components.
- Score: 4.524282351757178
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Advances in AI technology have made voice cloning increasingly accessible, leading to a rise in fraud involving AI-generated audio forgeries. This highlights the need to covertly embed information and verify the authenticity and integrity of audio. Digital Audio Watermarking plays a crucial role in this context. This study presents an improved Phase Coding audio steganography algorithm that segments the audio signal dynamically, embedding data into the mid-frequency phase components. This approach enhances resistance to steganalysis, simplifies computation, and ensures secure audio integrity.
Related papers
- XAttnMark: Learning Robust Audio Watermarking with Cross-Attention [15.216472445154064]
Cross-Attention Robust Audio Watermark (XAttnMark)
This paper introduces Cross-Attention Robust Audio Watermark (XAttnMark), which bridges the gap by leveraging partial parameter sharing between the generator and the detector.
We propose a psychoacoustic-aligned temporal-frequency masking loss that captures fine-grained auditory masking effects, enhancing watermark imperceptibility.
arXiv Detail & Related papers (2025-02-06T17:15:08Z) - Audios Don't Lie: Multi-Frequency Channel Attention Mechanism for Audio Deepfake Detection [0.0]
This study proposes an audio deepfake detection method based on multi-frequency channel attention mechanism (MFCA) and 2D discrete cosine transform (DCT)
By processing the audio signal into a melspectrogram, using MobileNet V2 to extract deep features, this method can effectively capture the fine-grained frequency domain features in the audio signal.
Experimental results show that compared with traditional methods, the model proposed in this study shows significant advantages in accuracy, precision,recall, F1 score and other indicators.
arXiv Detail & Related papers (2024-12-12T17:15:49Z) - Efficient Streaming Voice Steganalysis in Challenging Detection Scenarios [13.049308869863248]
This paper introduces a Dual-View VoIP Steganalysis Framework (DVSF)
The framework randomly obfuscates parts of the native steganographic descriptors in VoIP stream segments.
It then captures fine-grained local features related to steganography, building on the global features of VoIP.
arXiv Detail & Related papers (2024-11-20T02:22:58Z) - Exploring the Role of Audio in Video Captioning [59.679122191706426]
We present an audio-visual framework, which aims to fully exploit the potential of the audio modality for captioning.
We propose new local-global fusion mechanisms to improve information exchange across audio and video.
arXiv Detail & Related papers (2023-06-21T20:54:52Z) - Make-An-Audio 2: Temporal-Enhanced Text-to-Audio Generation [72.7915031238824]
Large diffusion models have been successful in text-to-audio (T2A) synthesis tasks.
They often suffer from common issues such as semantic misalignment and poor temporal consistency.
We propose Make-an-Audio 2, a latent diffusion-based T2A method that builds on the success of Make-an-Audio.
arXiv Detail & Related papers (2023-05-29T10:41:28Z) - Anomalous Sound Detection using Audio Representation with Machine ID
based Contrastive Learning Pretraining [52.191658157204856]
This paper uses contrastive learning to refine audio representations for each machine ID, rather than for each audio sample.
The proposed two-stage method uses contrastive learning to pretrain the audio representation model.
Experiments show that our method outperforms the state-of-the-art methods using contrastive learning or self-supervised classification.
arXiv Detail & Related papers (2023-04-07T11:08:31Z) - Efficient Audio Captioning Transformer with Patchout and Text Guidance [74.59739661383726]
We propose a full Transformer architecture that utilizes Patchout as proposed in [1], significantly reducing the computational complexity and avoiding overfitting.
The caption generation is partly conditioned on textual AudioSet tags extracted by a pre-trained classification model.
Our proposed method received the Judges Award at the Task6A of DCASE Challenge 2022.
arXiv Detail & Related papers (2023-04-06T07:58:27Z) - Make-An-Audio: Text-To-Audio Generation with Prompt-Enhanced Diffusion
Models [65.18102159618631]
multimodal generative modeling has created milestones in text-to-image and text-to-video generation.
Its application to audio still lags behind for two main reasons: the lack of large-scale datasets with high-quality text-audio pairs, and the complexity of modeling long continuous audio data.
We propose Make-An-Audio with a prompt-enhanced diffusion model that addresses these gaps.
arXiv Detail & Related papers (2023-01-30T04:44:34Z) - AudioGen: Textually Guided Audio Generation [116.57006301417306]
We tackle the problem of generating audio samples conditioned on descriptive text captions.
In this work, we propose AaudioGen, an auto-regressive model that generates audio samples conditioned on text inputs.
arXiv Detail & Related papers (2022-09-30T10:17:05Z) - Automatic Audio Captioning using Attention weighted Event based
Embeddings [25.258177951665594]
We propose an encoder-decoder architecture with light-weight (i.e. with lesser learnable parameters) Bi-LSTM recurrent layers for AAC.
Our results show that an efficient AED based embedding extractor combined with temporal attention and augmentation techniques is able to surpass existing literature.
arXiv Detail & Related papers (2022-01-28T05:54:19Z) - Artificially Synthesising Data for Audio Classification and Segmentation
to Improve Speech and Music Detection in Radio Broadcast [0.0]
We present a novel procedure that artificially synthesises data that resembles radio signals.
We trained a Convolutional Recurrent Neural Network (CRNN) on this synthesised data and outperformed state-of-the-art algorithms for music-speech detection.
arXiv Detail & Related papers (2021-02-19T14:47:05Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.