LoRA-like Calibration for Multimodal Deception Detection using ATSFace
Data
- URL: http://arxiv.org/abs/2309.01383v1
- Date: Mon, 4 Sep 2023 06:22:25 GMT
- Title: LoRA-like Calibration for Multimodal Deception Detection using ATSFace
Data
- Authors: Shun-Wen Hsiao and Cheng-Yuan Sun
- Abstract summary: We introduce an attention-aware neural network addressing challenges inherent in video data and deception dynamics.
We employ a multimodal fusion strategy that enhances accuracy; our approach yields a 92% accuracy rate on a real-life trial dataset.
- Score: 1.550120821358415
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Recently, deception detection on human videos is an eye-catching techniques
and can serve lots applications. AI model in this domain demonstrates the high
accuracy, but AI tends to be a non-interpretable black box. We introduce an
attention-aware neural network addressing challenges inherent in video data and
deception dynamics. This model, through its continuous assessment of visual,
audio, and text features, pinpoints deceptive cues. We employ a multimodal
fusion strategy that enhances accuracy; our approach yields a 92\% accuracy
rate on a real-life trial dataset. Most important of all, the model indicates
the attention focus in the videos, providing valuable insights on deception
cues. Hence, our method adeptly detects deceit and elucidates the underlying
process. We further enriched our study with an experiment involving students
answering questions either truthfully or deceitfully, resulting in a new
dataset of 309 video clips, named ATSFace. Using this, we also introduced a
calibration method, which is inspired by Low-Rank Adaptation (LoRA), to refine
individual-based deception detection accuracy.
Related papers
- A Multimodal Framework for Deepfake Detection [0.0]
Deepfakes, synthetic media created using AI, can convincingly alter videos and audio to misrepresent reality.
Our research addresses the critical issue of deepfakes through an innovative multimodal approach.
Our framework combines visual and auditory analyses, yielding an accuracy of 94%.
arXiv Detail & Related papers (2024-10-04T14:59:10Z) - Zero-shot Retrieval: Augmenting Pre-trained Models with Search Engines [83.65380507372483]
Large pre-trained models can dramatically reduce the amount of task-specific data required to solve a problem, but they often fail to capture domain-specific nuances out of the box.
This paper shows how to leverage recent advances in NLP and multi-modal learning to augment a pre-trained model with search engine retrieval.
arXiv Detail & Related papers (2023-11-29T05:33:28Z) - AVTENet: Audio-Visual Transformer-based Ensemble Network Exploiting
Multiple Experts for Video Deepfake Detection [53.448283629898214]
The recent proliferation of hyper-realistic deepfake videos has drawn attention to the threat of audio and visual forgeries.
Most previous work on detecting AI-generated fake videos only utilize visual modality or audio modality.
We propose an Audio-Visual Transformer-based Ensemble Network (AVTENet) framework that considers both acoustic manipulation and visual manipulation.
arXiv Detail & Related papers (2023-10-19T19:01:26Z) - Mitigating Representation Bias in Action Recognition: Algorithms and
Benchmarks [76.35271072704384]
Deep learning models perform poorly when applied to videos with rare scenes or objects.
We tackle this problem from two different angles: algorithm and dataset.
We show that the debiased representation can generalize better when transferred to other datasets and tasks.
arXiv Detail & Related papers (2022-09-20T00:30:35Z) - Cloud based Scalable Object Recognition from Video Streams using
Orientation Fusion and Convolutional Neural Networks [11.44782606621054]
Convolutional neural networks (CNNs) have been widely used to perform intelligent visual object recognition.
CNNs still suffer from severe accuracy degradation, particularly on illumination-variant datasets.
We propose a new CNN method based on orientation fusion for visual object recognition.
arXiv Detail & Related papers (2021-06-19T07:15:15Z) - Improving the Efficiency and Robustness of Deepfakes Detection through
Precise Geometric Features [13.033517345182728]
Deepfakes is a branch of malicious techniques that transplant a target face to the original one in videos.
Previous efforts for Deepfakes videos detection mainly focused on appearance features, which have a risk of being bypassed by sophisticated manipulation.
We propose an efficient and robust framework named LRNet for detecting Deepfakes videos through temporal modeling on precise geometric features.
arXiv Detail & Related papers (2021-04-09T16:57:55Z) - Depth Guided Adaptive Meta-Fusion Network for Few-shot Video Recognition [86.31412529187243]
Few-shot video recognition aims at learning new actions with only very few labeled samples.
We propose a depth guided Adaptive Meta-Fusion Network for few-shot video recognition which is termed as AMeFu-Net.
arXiv Detail & Related papers (2020-10-20T03:06:20Z) - Video Anomaly Detection Using Pre-Trained Deep Convolutional Neural Nets
and Context Mining [2.0646127669654835]
We show how to use pre-trained convolutional neural net models to perform feature extraction and context mining.
We derive contextual properties from the high-level features to further improve the performance of our video anomaly detection method.
arXiv Detail & Related papers (2020-10-06T00:26:14Z) - Any-Shot Sequential Anomaly Detection in Surveillance Videos [36.24563211765782]
We propose an online anomaly detection method for surveillance videos using transfer learning and any-shot learning.
Our proposed algorithm leverages the feature extraction power of neural network-based models for transfer learning and the any-shot learning capability of statistical detection methods.
arXiv Detail & Related papers (2020-04-05T02:15:45Z) - Self-trained Deep Ordinal Regression for End-to-End Video Anomaly
Detection [114.9714355807607]
We show that applying self-trained deep ordinal regression to video anomaly detection overcomes two key limitations of existing methods.
We devise an end-to-end trainable video anomaly detection approach that enables joint representation learning and anomaly scoring without manually labeled normal/abnormal data.
arXiv Detail & Related papers (2020-03-15T08:44:55Z) - Emotions Don't Lie: An Audio-Visual Deepfake Detection Method Using
Affective Cues [75.1731999380562]
We present a learning-based method for detecting real and fake deepfake multimedia content.
We extract and analyze the similarity between the two audio and visual modalities from within the same video.
We compare our approach with several SOTA deepfake detection methods and report per-video AUC of 84.4% on the DFDC and 96.6% on the DF-TIMIT datasets.
arXiv Detail & Related papers (2020-03-14T22:07:26Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.