Visual Representations of Physiological Signals for Fake Video Detection
- URL: http://arxiv.org/abs/2207.08380v1
- Date: Mon, 18 Jul 2022 05:14:24 GMT
- Title: Visual Representations of Physiological Signals for Fake Video Detection
- Authors: Kalin Stefanov, Bhawna Paliwal, Abhinav Dhall
- Abstract summary: This paper presents a multimodal learning-based method for detection of real and fake videos.
The method combines information from three modalities - audio, video, and physiology.
The results show significant increase in detection performance compared to previous methods.
- Score: 5.833272638548153
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Realistic fake videos are a potential tool for spreading harmful
misinformation given our increasing online presence and information intake.
This paper presents a multimodal learning-based method for detection of real
and fake videos. The method combines information from three modalities - audio,
video, and physiology. We investigate two strategies for combining the video
and physiology modalities, either by augmenting the video with information from
the physiology or by novelly learning the fusion of those two modalities with a
proposed Graph Convolutional Network architecture. Both strategies for
combining the two modalities rely on a novel method for generation of visual
representations of physiological signals. The detection of real and fake videos
is then based on the dissimilarity between the audio and modified video
modalities. The proposed method is evaluated on two benchmark datasets and the
results show significant increase in detection performance compared to previous
methods.
Related papers
- AVTENet: Audio-Visual Transformer-based Ensemble Network Exploiting
Multiple Experts for Video Deepfake Detection [53.448283629898214]
The recent proliferation of hyper-realistic deepfake videos has drawn attention to the threat of audio and visual forgeries.
Most previous work on detecting AI-generated fake videos only utilize visual modality or audio modality.
We propose an Audio-Visual Transformer-based Ensemble Network (AVTENet) framework that considers both acoustic manipulation and visual manipulation.
arXiv Detail & Related papers (2023-10-19T19:01:26Z) - Multimodal Short Video Rumor Detection System Based on Contrastive
Learning [3.4192832062683842]
Short video platforms in China have gradually evolved into fertile grounds for the proliferation of fake news.
distinguishing short video rumors poses a significant challenge due to the substantial amount of information and shared features.
Our research group proposes a methodology encompassing multimodal feature fusion and the integration of external knowledge.
arXiv Detail & Related papers (2023-04-17T16:07:00Z) - Weakly-supervised High-fidelity Ultrasound Video Synthesis with Feature
Decoupling [13.161739586288704]
In clinical practice, analysis and diagnosis often rely on US sequences rather than a single image to obtain dynamic anatomical information.
This is challenging for novices to learn because practicing with adequate videos from patients is clinically unpractical.
We propose a novel framework to synthesize high-fidelity US videos.
arXiv Detail & Related papers (2022-07-01T14:53:22Z) - Combining Contrastive and Supervised Learning for Video Super-Resolution
Detection [0.0]
We propose a new upscaled-resolution-detection method based on learning of visual representations using contrastive and cross-entropy losses.
Our method effectively detects upscaling even in compressed videos and outperforms the state-of-the-art alternatives.
arXiv Detail & Related papers (2022-05-20T18:58:13Z) - Self-Supervised Video Representation Learning by Video Incoherence
Detection [28.540645395066434]
This paper introduces a novel self-supervised method that leverages incoherence detection for video representation learning.
It roots from the observation that visual systems of human beings can easily identify video incoherence based on their comprehensive understanding of videos.
arXiv Detail & Related papers (2021-09-26T04:58:13Z) - Few-Shot Video Object Detection [70.43402912344327]
We introduce Few-Shot Video Object Detection (FSVOD) with three important contributions.
FSVOD-500 comprises of 500 classes with class-balanced videos in each category for few-shot learning.
Our TPN and TMN+ are jointly and end-to-end trained.
arXiv Detail & Related papers (2021-04-30T07:38:04Z) - Neuro-Symbolic Representations for Video Captioning: A Case for
Leveraging Inductive Biases for Vision and Language [148.0843278195794]
We propose a new model architecture for learning multi-modal neuro-symbolic representations for video captioning.
Our approach uses a dictionary learning-based method of learning relations between videos and their paired text descriptions.
arXiv Detail & Related papers (2020-11-18T20:21:19Z) - Relational Graph Learning on Visual and Kinematics Embeddings for
Accurate Gesture Recognition in Robotic Surgery [84.73764603474413]
We propose a novel online approach of multi-modal graph network (i.e., MRG-Net) to dynamically integrate visual and kinematics information.
The effectiveness of our method is demonstrated with state-of-the-art results on the public JIGSAWS dataset.
arXiv Detail & Related papers (2020-11-03T11:00:10Z) - Self-supervised Video Representation Learning by Pace Prediction [48.029602040786685]
This paper addresses the problem of self-supervised video representation learning from a new perspective -- by video pace prediction.
It stems from the observation that human visual system is sensitive to video pace.
We randomly sample training clips in different paces and ask a neural network to identify the pace for each video clip.
arXiv Detail & Related papers (2020-08-13T12:40:24Z) - Emotions Don't Lie: An Audio-Visual Deepfake Detection Method Using
Affective Cues [75.1731999380562]
We present a learning-based method for detecting real and fake deepfake multimedia content.
We extract and analyze the similarity between the two audio and visual modalities from within the same video.
We compare our approach with several SOTA deepfake detection methods and report per-video AUC of 84.4% on the DFDC and 96.6% on the DF-TIMIT datasets.
arXiv Detail & Related papers (2020-03-14T22:07:26Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.