Benchmarking Cross-Domain Audio-Visual Deception Detection
- URL: http://arxiv.org/abs/2405.06995v1
- Date: Sat, 11 May 2024 12:06:31 GMT
- Title: Benchmarking Cross-Domain Audio-Visual Deception Detection
- Authors: Xiaobao Guo, Zitong Yu, Nithish Muthuchamy Selvaraj, Bingquan Shen, Adams Wai-Kin Kong, Alex C. Kot,
- Abstract summary: We present the first cross-domain audio-visual deception detection benchmark.
We compare single-to-single and multi-to-single domain generalization performance.
We propose the Attention-Mixer fusion method to improve performance.
- Score: 45.342156006617394
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Automated deception detection is crucial for assisting humans in accurately assessing truthfulness and identifying deceptive behavior. Conventional contact-based techniques, like polygraph devices, rely on physiological signals to determine the authenticity of an individual's statements. Nevertheless, recent developments in automated deception detection have demonstrated that multimodal features derived from both audio and video modalities may outperform human observers on publicly available datasets. Despite these positive findings, the generalizability of existing audio-visual deception detection approaches across different scenarios remains largely unexplored. To close this gap, we present the first cross-domain audio-visual deception detection benchmark, that enables us to assess how well these methods generalize for use in real-world scenarios. We used widely adopted audio and visual features and different architectures for benchmarking, comparing single-to-single and multi-to-single domain generalization performance. To further exploit the impacts using data from multiple source domains for training, we investigate three types of domain sampling strategies, including domain-simultaneous, domain-alternating, and domain-by-domain for multi-to-single domain generalization evaluation. Furthermore, we proposed the Attention-Mixer fusion method to improve performance, and we believe that this new cross-domain benchmark will facilitate future research in audio-visual deception detection. Protocols and source code are available at \href{https://github.com/Redaimao/cross_domain_DD}{https://github.com/Redaimao/cross\_domain\_DD}.
Related papers
- Improving Single Domain-Generalized Object Detection: A Focus on Diversification and Alignment [17.485775402656127]
A base detector can outperform existing methods for single domain generalization by a good margin.
We introduce a method to align detections from multiple views, considering both classification and localization outputs.
Our approach is detector-agnostic and can be seamlessly applied to both single-stage and two-stage detectors.
arXiv Detail & Related papers (2024-05-23T12:29:25Z) - Unsupervised Cross-Domain Rumor Detection with Contrastive Learning and
Cross-Attention [0.0]
Massive rumors usually appear along with breaking news or trending topics, seriously hindering the truth.
Existing rumor detection methods are mostly focused on the same domain, and thus have poor performance in cross-domain scenarios.
We propose an end-to-end instance-wise and prototype-wise contrastive learning model with a cross-attention mechanism for cross-domain rumor detection.
arXiv Detail & Related papers (2023-03-20T06:19:49Z) - Background Matters: Enhancing Out-of-distribution Detection with Domain
Features [90.32910087103744]
OOD samples can be drawn from arbitrary distributions and exhibit deviations from in-distribution (ID) data in various dimensions.
Existing methods focus on detecting OOD samples based on the semantic features, while neglecting the other dimensions such as the domain features.
This paper proposes a novel generic framework that can learn the domain features from the ID training samples by a dense prediction approach.
arXiv Detail & Related papers (2023-03-15T16:12:14Z) - Frequency Spectrum Augmentation Consistency for Domain Adaptive Object
Detection [107.52026281057343]
We introduce a Frequency Spectrum Augmentation Consistency (FSAC) framework with four different low-frequency filter operations.
In the first stage, we utilize all the original and augmented source data to train an object detector.
In the second stage, augmented source and target data with pseudo labels are adopted to perform the self-training for prediction consistency.
arXiv Detail & Related papers (2021-12-16T04:07:01Z) - MD-CSDNetwork: Multi-Domain Cross Stitched Network for Deepfake
Detection [80.83725644958633]
Current deepfake generation methods leave discriminative artifacts in the frequency spectrum of fake images and videos.
We present a novel approach, termed as MD-CSDNetwork, for combining the features in the spatial and frequency domains to mine a shared discriminative representation.
arXiv Detail & Related papers (2021-09-15T14:11:53Z) - Cross-Domain First Person Audio-Visual Action Recognition through
Relative Norm Alignment [15.545769463854915]
First person action recognition is an increasingly researched topic because of the growing popularity of wearable cameras.
This is bringing to light cross-domain issues that are yet to be addressed in this context.
We propose to leverage over the intrinsic complementary nature of audio-visual signals to learn a representation that works well on data seen during training.
arXiv Detail & Related papers (2021-06-03T08:46:43Z) - Unsupervised Out-of-Domain Detection via Pre-trained Transformers [56.689635664358256]
Out-of-domain inputs can lead to unpredictable outputs and sometimes catastrophic safety issues.
Our work tackles the problem of detecting out-of-domain samples with only unsupervised in-domain data.
Two domain-specific fine-tuning approaches are further proposed to boost detection accuracy.
arXiv Detail & Related papers (2021-06-02T05:21:25Z) - Revisiting Mahalanobis Distance for Transformer-Based Out-of-Domain
Detection [60.88952532574564]
This paper conducts a thorough comparison of out-of-domain intent detection methods.
We evaluate multiple contextual encoders and methods, proven to be efficient, on three standard datasets for intent classification.
Our main findings show that fine-tuning Transformer-based encoders on in-domain data leads to superior results.
arXiv Detail & Related papers (2021-01-11T09:10:58Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.