THInImg: Cross-modal Steganography for Presenting Talking Heads in
Images
- URL: http://arxiv.org/abs/2311.17177v1
- Date: Tue, 28 Nov 2023 19:11:01 GMT
- Title: THInImg: Cross-modal Steganography for Presenting Talking Heads in
Images
- Authors: Lin Zhao, Hongxuan Li, Xuefei Ning, Xinru Jiang
- Abstract summary: Cross-modal Steganography is the practice of concealing secret signals in publicly available cover signals unobtrusively.
We propose THInImg, which manages to hide lengthy audio data inside an identity image by leveraging the properties of human face.
THInImg can present up to 80 seconds of high quality talking-head video (including audio) in an identity image with 160x160 resolution.
- Score: 14.09277898001307
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Cross-modal Steganography is the practice of concealing secret signals in
publicly available cover signals (distinct from the modality of the secret
signals) unobtrusively. While previous approaches primarily concentrated on
concealing a relatively small amount of information, we propose THInImg, which
manages to hide lengthy audio data (and subsequently decode talking head video)
inside an identity image by leveraging the properties of human face, which can
be effectively utilized for covert communication, transmission and copyright
protection. THInImg consists of two parts: the encoder and decoder. Inside the
encoder-decoder pipeline, we introduce a novel architecture that substantially
increase the capacity of hiding audio in images. Moreover, our framework can be
extended to iteratively hide multiple audio clips into an identity image,
offering multiple levels of control over permissions. We conduct extensive
experiments to prove the effectiveness of our method, demonstrating that
THInImg can present up to 80 seconds of high quality talking-head video
(including audio) in an identity image with 160x160 resolution.
Related papers
- SafeEar: Content Privacy-Preserving Audio Deepfake Detection [17.859275594843965]
We propose SafeEar, a novel framework that aims to detect deepfake audios without relying on accessing the speech content within.
Our key idea is to devise a neural audio into a novel decoupling model that well separates the semantic and acoustic information from audio samples.
In this way, no semantic content will be exposed to the detector.
arXiv Detail & Related papers (2024-09-14T02:45:09Z) - Large-capacity and Flexible Video Steganography via Invertible Neural
Network [60.34588692333379]
We propose a Large-capacity and Flexible Video Steganography Network (LF-VSN)
For large-capacity, we present a reversible pipeline to perform multiple videos hiding and recovering through a single invertible neural network (INN)
For flexibility, we propose a key-controllable scheme, enabling different receivers to recover particular secret videos from the same cover video through specific keys.
arXiv Detail & Related papers (2023-04-24T17:51:35Z) - Hiding Images in Deep Probabilistic Models [58.23127414572098]
We describe a different computational framework to hide images in deep probabilistic models.
Specifically, we use a DNN to model the probability density of cover images, and hide a secret image in one particular location of the learned distribution.
We demonstrate the feasibility of our SinGAN approach in terms of extraction accuracy and model security.
arXiv Detail & Related papers (2022-10-05T13:33:25Z) - Weakly-Supervised Action Detection Guided by Audio Narration [50.4318060593995]
We propose a model to learn from the narration supervision and utilize multimodal features, including RGB, motion flow, and ambient sound.
Our experiments show that noisy audio narration suffices to learn a good action detection model, thus reducing annotation expenses.
arXiv Detail & Related papers (2022-05-12T06:33:24Z) - Audio-Visual Person-of-Interest DeepFake Detection [77.04789677645682]
The aim of this work is to propose a deepfake detector that can cope with the wide variety of manipulation methods and scenarios encountered in the real world.
We leverage a contrastive learning paradigm to learn the moving-face and audio segment embeddings that are most discriminative for each identity.
Our method can detect both single-modality (audio-only, video-only) and multi-modality (audio-video) attacks, and is robust to low-quality or corrupted videos.
arXiv Detail & Related papers (2022-04-06T20:51:40Z) - Multitask Identity-Aware Image Steganography via Minimax Optimization [9.062839197237807]
We propose a framework, called Multitask Identity-Aware Image Steganography (MIAIS), to achieve direct recognition on container images without restoring secret images.
The key issue of the direct recognition is to preserve identity information of secret images into container images and make container images look similar to cover images at the same time.
In order to be flexible for the secret image restoration in some cases, we incorporate an optional restoration network into our method.
arXiv Detail & Related papers (2021-07-13T02:53:38Z) - Pose-Controllable Talking Face Generation by Implicitly Modularized
Audio-Visual Representation [96.66010515343106]
We propose a clean yet effective framework to generate pose-controllable talking faces.
We operate on raw face images, using only a single photo as an identity reference.
Our model has multiple advanced capabilities including extreme view robustness and talking face frontalization.
arXiv Detail & Related papers (2021-04-22T15:10:26Z) - Deep Neural Networks based Invisible Steganography for Audio-into-Image
Algorithm [0.0]
The integrity of both image and audio is well preserved, while the maximum length of the hidden audio is significantly improved.
We employ a joint deep neural network architecture consisting of two sub-models: the first network hides the secret audio into an image, and the second one is responsible for decoding the image to obtain the original audio.
arXiv Detail & Related papers (2021-02-18T06:13:05Z) - Multi-Stage Residual Hiding for Image-into-Audio Steganography [40.669605041776954]
We present a cross-modal steganography method for hiding image content into audio carriers.
The proposed framework makes the controlling of payload capacity more flexible.
Experiments suggest that modifications to the carrier are unnoticeable by human listeners.
arXiv Detail & Related papers (2021-01-06T05:01:45Z) - InfoScrub: Towards Attribute Privacy by Targeted Obfuscation [77.49428268918703]
We study techniques that allow individuals to limit the private information leaked in visual data.
We tackle this problem in a novel image obfuscation framework.
We find our approach generates obfuscated images faithful to the original input images, and additionally increase uncertainty by 6.2$times$ (or up to 0.85 bits) over the non-obfuscated counterparts.
arXiv Detail & Related papers (2020-05-20T19:48:04Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.