Related papers: Benchmarking Cross-Domain Audio-Visual Deception Detection

Benchmarking Cross-Domain Audio-Visual Deception Detection

URL: http://arxiv.org/abs/2405.06995v2
Date: Sat, 05 Oct 2024 07:32:03 GMT
Title: Benchmarking Cross-Domain Audio-Visual Deception Detection
Authors: Xiaobao Guo, Zitong Yu, Nithish Muthuchamy Selvaraj, Bingquan Shen, Adams Wai-Kin Kong, Alex C. Kot,
Abstract summary: We present the first cross-domain audio-visual deception detection benchmark. We compare single-to-single and multi-to-single domain generalization performance. We propose an algorithm to enhance the generalization performance.
Score: 45.342156006617394
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Automated deception detection is crucial for assisting humans in accurately assessing truthfulness and identifying deceptive behavior. Conventional contact-based techniques, like polygraph devices, rely on physiological signals to determine the authenticity of an individual's statements. Nevertheless, recent developments in automated deception detection have demonstrated that multimodal features derived from both audio and video modalities may outperform human observers on publicly available datasets. Despite these positive findings, the generalizability of existing audio-visual deception detection approaches across different scenarios remains largely unexplored. To close this gap, we present the first cross-domain audio-visual deception detection benchmark, that enables us to assess how well these methods generalize for use in real-world scenarios. We used widely adopted audio and visual features and different architectures for benchmarking, comparing single-to-single and multi-to-single domain generalization performance. To further exploit the impacts using data from multiple source domains for training, we investigate three types of domain sampling strategies, including domain-simultaneous, domain-alternating, and domain-by-domain for multi-to-single domain generalization evaluation. We also propose an algorithm to enhance the generalization performance by maximizing the gradient inner products between modality encoders, named ``MM-IDGM". Furthermore, we proposed the Attention-Mixer fusion method to improve performance, and we believe that this new cross-domain benchmark will facilitate future research in audio-visual deception detection.

Related papers

Consistent and Invariant Generalization Learning for Short-video Misinformation Detection [10.402862106017965]
Short-video misinformation detection has attracted wide attention in the multi-modal domain.<n>Current models often exhibit unsatisfactory performance on unseen domains due to domain gaps.<n>We propose a new DOmain generalization model via ConsisTency and invariance learning for shORt-video misinformation detection.
arXiv Detail & Related papers (2025-07-05T14:53:32Z)
Improving Generalization for AI-Synthesized Voice Detection [13.5672344219478]
We introduce an innovative disentanglement framework aimed at extracting domain-agnostic artifact features related to vocoders. We enhance model learning in a flat loss landscape, enabling escape from suboptimal solutions and improving generalization.
arXiv Detail & Related papers (2024-12-26T16:45:20Z)
Object Style Diffusion for Generalized Object Detection in Urban Scene [69.04189353993907]
We introduce a novel single-domain object detection generalization method, named GoDiff. By integrating pseudo-target domain data with source domain data, we diversify the training dataset. Experimental results demonstrate that our method not only enhances the generalization ability of existing detectors but also functions as a plug-and-play enhancement for other single-domain generalization methods.
arXiv Detail & Related papers (2024-12-18T13:03:00Z)
Feature-Space Semantic Invariance: Enhanced OOD Detection for Open-Set Domain Generalization [10.38552112657656]
We propose a unified framework for open-set domain generalization by introducing Feature-space Semantic Invariance (FSI) FSI maintains semantic consistency across different domains within the feature space, enabling more accurate detection of OOD instances in unseen domains. We also adopt a generative model to produce synthetic data with novel domain styles or class labels, enhancing model robustness.
arXiv Detail & Related papers (2024-11-11T21:51:45Z)
Generalize or Detect? Towards Robust Semantic Segmentation Under Multiple Distribution Shifts [56.57141696245328]
In open-world scenarios, where both novel classes and domains may exist, an ideal segmentation model should detect anomaly classes for safety. Existing methods often struggle to distinguish between domain-level and semantic-level distribution shifts.
arXiv Detail & Related papers (2024-11-06T11:03:02Z)
Improving Single Domain-Generalized Object Detection: A Focus on Diversification and Alignment [17.485775402656127]
A base detector can outperform existing methods for single domain generalization by a good margin. We introduce a method to align detections from multiple views, considering both classification and localization outputs. Our approach is detector-agnostic and can be seamlessly applied to both single-stage and two-stage detectors.
arXiv Detail & Related papers (2024-05-23T12:29:25Z)
Improving Anomaly Segmentation with Multi-Granularity Cross-Domain Alignment [17.086123737443714]
Anomaly segmentation plays a pivotal role in identifying atypical objects in images, crucial for hazard detection in autonomous driving systems. While existing methods demonstrate noteworthy results on synthetic data, they often fail to consider the disparity between synthetic and real-world data domains. We introduce the Multi-Granularity Cross-Domain Alignment framework, tailored to harmonize features across domains at both the scene and individual sample levels.
arXiv Detail & Related papers (2023-08-16T22:54:49Z)
CLIP the Gap: A Single Domain Generalization Approach for Object Detection [60.20931827772482]
Single Domain Generalization tackles the problem of training a model on a single source domain so that it generalizes to any unseen target domain. We propose to leverage a pre-trained vision-language model to introduce semantic domain concepts via textual prompts. We achieve this via a semantic augmentation strategy acting on the features extracted by the detector backbone, as well as a text-based classification loss.
arXiv Detail & Related papers (2023-01-13T12:01:18Z)
Frequency Spectrum Augmentation Consistency for Domain Adaptive Object Detection [107.52026281057343]
We introduce a Frequency Spectrum Augmentation Consistency (FSAC) framework with four different low-frequency filter operations. In the first stage, we utilize all the original and augmented source data to train an object detector. In the second stage, augmented source and target data with pseudo labels are adopted to perform the self-training for prediction consistency.
arXiv Detail & Related papers (2021-12-16T04:07:01Z)
MD-CSDNetwork: Multi-Domain Cross Stitched Network for Deepfake Detection [80.83725644958633]
Current deepfake generation methods leave discriminative artifacts in the frequency spectrum of fake images and videos. We present a novel approach, termed as MD-CSDNetwork, for combining the features in the spatial and frequency domains to mine a shared discriminative representation.
arXiv Detail & Related papers (2021-09-15T14:11:53Z)
Cross-Domain First Person Audio-Visual Action Recognition through Relative Norm Alignment [15.545769463854915]
First person action recognition is an increasingly researched topic because of the growing popularity of wearable cameras. This is bringing to light cross-domain issues that are yet to be addressed in this context. We propose to leverage over the intrinsic complementary nature of audio-visual signals to learn a representation that works well on data seen during training.
arXiv Detail & Related papers (2021-06-03T08:46:43Z)
Multi-Domain Adversarial Feature Generalization for Person Re-Identification [52.835955258959785]
We propose a multi-dataset feature generalization network (MMFA-AAE) It is capable of learning a universal domain-invariant feature representation from multiple labeled datasets and generalizing it to unseen' camera systems. It also surpasses many state-of-the-art supervised methods and unsupervised domain adaptation methods by a large margin.
arXiv Detail & Related papers (2020-11-25T08:03:15Z)

This list is automatically generated from the titles and abstracts of the papers in this site.