Benchmarking Cross-Domain Audio-Visual Deception Detection
- URL: http://arxiv.org/abs/2405.06995v2
- Date: Sat, 05 Oct 2024 07:32:03 GMT
- Title: Benchmarking Cross-Domain Audio-Visual Deception Detection
- Authors: Xiaobao Guo, Zitong Yu, Nithish Muthuchamy Selvaraj, Bingquan Shen, Adams Wai-Kin Kong, Alex C. Kot,
- Abstract summary: We present the first cross-domain audio-visual deception detection benchmark.
We compare single-to-single and multi-to-single domain generalization performance.
We propose an algorithm to enhance the generalization performance.
- Score: 45.342156006617394
- License:
- Abstract: Automated deception detection is crucial for assisting humans in accurately assessing truthfulness and identifying deceptive behavior. Conventional contact-based techniques, like polygraph devices, rely on physiological signals to determine the authenticity of an individual's statements. Nevertheless, recent developments in automated deception detection have demonstrated that multimodal features derived from both audio and video modalities may outperform human observers on publicly available datasets. Despite these positive findings, the generalizability of existing audio-visual deception detection approaches across different scenarios remains largely unexplored. To close this gap, we present the first cross-domain audio-visual deception detection benchmark, that enables us to assess how well these methods generalize for use in real-world scenarios. We used widely adopted audio and visual features and different architectures for benchmarking, comparing single-to-single and multi-to-single domain generalization performance. To further exploit the impacts using data from multiple source domains for training, we investigate three types of domain sampling strategies, including domain-simultaneous, domain-alternating, and domain-by-domain for multi-to-single domain generalization evaluation. We also propose an algorithm to enhance the generalization performance by maximizing the gradient inner products between modality encoders, named ``MM-IDGM". Furthermore, we proposed the Attention-Mixer fusion method to improve performance, and we believe that this new cross-domain benchmark will facilitate future research in audio-visual deception detection.
Related papers
- Improving Generalization for AI-Synthesized Voice Detection [13.5672344219478]
We introduce an innovative disentanglement framework aimed at extracting domain-agnostic artifact features related to vocoders.
We enhance model learning in a flat loss landscape, enabling escape from suboptimal solutions and improving generalization.
arXiv Detail & Related papers (2024-12-26T16:45:20Z) - Object Style Diffusion for Generalized Object Detection in Urban Scene [69.04189353993907]
We introduce a novel single-domain object detection generalization method, named GoDiff.
By integrating pseudo-target domain data with source domain data, we diversify the training dataset.
Experimental results demonstrate that our method not only enhances the generalization ability of existing detectors but also functions as a plug-and-play enhancement for other single-domain generalization methods.
arXiv Detail & Related papers (2024-12-18T13:03:00Z) - Feature-Space Semantic Invariance: Enhanced OOD Detection for Open-Set Domain Generalization [10.38552112657656]
We propose a unified framework for open-set domain generalization by introducing Feature-space Semantic Invariance (FSI)
FSI maintains semantic consistency across different domains within the feature space, enabling more accurate detection of OOD instances in unseen domains.
We also adopt a generative model to produce synthetic data with novel domain styles or class labels, enhancing model robustness.
arXiv Detail & Related papers (2024-11-11T21:51:45Z) - Generalize or Detect? Towards Robust Semantic Segmentation Under Multiple Distribution Shifts [56.57141696245328]
In open-world scenarios, where both novel classes and domains may exist, an ideal segmentation model should detect anomaly classes for safety.
Existing methods often struggle to distinguish between domain-level and semantic-level distribution shifts.
arXiv Detail & Related papers (2024-11-06T11:03:02Z) - Improving Single Domain-Generalized Object Detection: A Focus on Diversification and Alignment [17.485775402656127]
A base detector can outperform existing methods for single domain generalization by a good margin.
We introduce a method to align detections from multiple views, considering both classification and localization outputs.
Our approach is detector-agnostic and can be seamlessly applied to both single-stage and two-stage detectors.
arXiv Detail & Related papers (2024-05-23T12:29:25Z) - Frequency Spectrum Augmentation Consistency for Domain Adaptive Object
Detection [107.52026281057343]
We introduce a Frequency Spectrum Augmentation Consistency (FSAC) framework with four different low-frequency filter operations.
In the first stage, we utilize all the original and augmented source data to train an object detector.
In the second stage, augmented source and target data with pseudo labels are adopted to perform the self-training for prediction consistency.
arXiv Detail & Related papers (2021-12-16T04:07:01Z) - MD-CSDNetwork: Multi-Domain Cross Stitched Network for Deepfake
Detection [80.83725644958633]
Current deepfake generation methods leave discriminative artifacts in the frequency spectrum of fake images and videos.
We present a novel approach, termed as MD-CSDNetwork, for combining the features in the spatial and frequency domains to mine a shared discriminative representation.
arXiv Detail & Related papers (2021-09-15T14:11:53Z) - Cross-Domain First Person Audio-Visual Action Recognition through
Relative Norm Alignment [15.545769463854915]
First person action recognition is an increasingly researched topic because of the growing popularity of wearable cameras.
This is bringing to light cross-domain issues that are yet to be addressed in this context.
We propose to leverage over the intrinsic complementary nature of audio-visual signals to learn a representation that works well on data seen during training.
arXiv Detail & Related papers (2021-06-03T08:46:43Z) - Multi-Domain Adversarial Feature Generalization for Person
Re-Identification [52.835955258959785]
We propose a multi-dataset feature generalization network (MMFA-AAE)
It is capable of learning a universal domain-invariant feature representation from multiple labeled datasets and generalizing it to unseen' camera systems.
It also surpasses many state-of-the-art supervised methods and unsupervised domain adaptation methods by a large margin.
arXiv Detail & Related papers (2020-11-25T08:03:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.