Gumbel Rao Monte Carlo based Bi-Modal Neural Architecture Search for Audio-Visual Deepfake Detection
- URL: http://arxiv.org/abs/2410.06543v1
- Date: Wed, 9 Oct 2024 04:37:35 GMT
- Title: Gumbel Rao Monte Carlo based Bi-Modal Neural Architecture Search for Audio-Visual Deepfake Detection
- Authors: Aravinda Reddy PN, Raghavendra Ramachandra, Krothapalli Sreenivasa Rao, Pabitra Mitra Vinod Rathod,
- Abstract summary: Deepfakes pose a critical threat to biometric authentication systems by generating highly realistic synthetic media.
Existing multimodal deepfake detectors often struggle to adapt to diverse data and rely on simple fusion methods.
We propose a novel architecture search framework that employs Gumbel-Rao Monte Carlo sampling to optimize multimodal fusion.
- Score: 2.711788614039839
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Deepfakes pose a critical threat to biometric authentication systems by generating highly realistic synthetic media. Existing multimodal deepfake detectors often struggle to adapt to diverse data and rely on simple fusion methods. To address these challenges, we propose Gumbel-Rao Monte Carlo Bi-modal Neural Architecture Search (GRMC-BMNAS), a novel architecture search framework that employs Gumbel-Rao Monte Carlo sampling to optimize multimodal fusion. It refines the Straight through Gumbel Softmax (STGS) method by reducing variance with Rao-Blackwellization, stabilizing network training. Using a two-level search approach, the framework optimizes the network architecture, parameters, and performance. Crucial features are efficiently identified from backbone networks, while within the cell structure, a weighted fusion operation integrates information from various sources. By varying parameters such as temperature and number of Monte carlo samples yields an architecture that maximizes classification performance and better generalisation capability. Experimental results on the FakeAVCeleb and SWAN-DF datasets demonstrate an impressive AUC percentage of 95.4\%, achieved with minimal model parameters.
Related papers
- A Low-Complexity Plug-and-Play Deep Learning Model for Massive MIMO Precoding Across Sites [5.896656636095934]
MMIMO technology has transformed wireless communication by enhancing spectral efficiency and network capacity.
This paper proposes a novel deep learning-based mMIMO precoder to tackle the complexity challenges of existing approaches.
arXiv Detail & Related papers (2025-02-12T20:02:36Z) - Task-Oriented Real-time Visual Inference for IoVT Systems: A Co-design Framework of Neural Networks and Edge Deployment [61.20689382879937]
Task-oriented edge computing addresses this by shifting data analysis to the edge.
Existing methods struggle to balance high model performance with low resource consumption.
We propose a novel co-design framework to optimize neural network architecture.
arXiv Detail & Related papers (2024-10-29T19:02:54Z) - POMONAG: Pareto-Optimal Many-Objective Neural Architecture Generator [4.09225917049674]
Transferable NAS has emerged, generalizing the search process from dataset-dependent to task-dependent.
This paper introduces POMONAG, extending DiffusionNAG via a many-optimal diffusion process.
Results were validated on two search spaces -- NAS201 and MobileNetV3 -- and evaluated across 15 image classification datasets.
arXiv Detail & Related papers (2024-09-30T16:05:29Z) - Straight Through Gumbel Softmax Estimator based Bimodal Neural Architecture Search for Audio-Visual Deepfake Detection [6.367999777464464]
multimodal deepfake detectors rely on conventional fusion methods, such as majority rule and ensemble voting.
In this paper, we introduce the Straight-through Gumbel-Softmax framework, offering a comprehensive approach to search multimodal fusion model architectures.
Experiments on the FakeAVCeleb and SWAN-DF datasets demonstrated an impressive AUC value 94.4% achieved with minimal model parameters.
arXiv Detail & Related papers (2024-06-19T09:26:22Z) - Multimodal Learned Sparse Retrieval with Probabilistic Expansion Control [66.78146440275093]
Learned retrieval (LSR) is a family of neural methods that encode queries and documents into sparse lexical vectors.
We explore the application of LSR to the multi-modal domain, with a focus on text-image retrieval.
Current approaches like LexLIP and STAIR require complex multi-step training on massive datasets.
Our proposed approach efficiently transforms dense vectors from a frozen dense model into sparse lexical vectors.
arXiv Detail & Related papers (2024-02-27T14:21:56Z) - Triple-level Model Inferred Collaborative Network Architecture for Video
Deraining [43.06607185181434]
We develop a model-guided triple-level optimization framework to deduce network architecture with cooperating optimization and auto-searching mechanism.
Our model shows significant improvements in fidelity and temporal consistency over the state-of-the-art works.
arXiv Detail & Related papers (2021-11-08T13:09:00Z) - Efficient Data-specific Model Search for Collaborative Filtering [56.60519991956558]
Collaborative filtering (CF) is a fundamental approach for recommender systems.
In this paper, motivated by the recent advances in automated machine learning (AutoML), we propose to design a data-specific CF model.
Key here is a new framework that unifies state-of-the-art (SOTA) CF methods and splits them into disjoint stages of input encoding, embedding function, interaction and prediction function.
arXiv Detail & Related papers (2021-06-14T14:30:32Z) - Lightweight Image Super-Resolution with Hierarchical and Differentiable
Neural Architecture Search [38.83764580480486]
Single Image Super-Resolution (SISR) tasks have achieved significant performance with deep neural networks.
We propose a novel differentiable Neural Architecture Search (NAS) approach on both the cell-level and network-level to search for lightweight SISR models.
arXiv Detail & Related papers (2021-05-09T13:30:16Z) - AutoPose: Searching Multi-Scale Branch Aggregation for Pose Estimation [96.29533512606078]
We present AutoPose, a novel neural architecture search(NAS) framework.
It is capable of automatically discovering multiple parallel branches of cross-scale connections towards accurate and high-resolution 2D human pose estimation.
arXiv Detail & Related papers (2020-08-16T22:27:43Z) - When Residual Learning Meets Dense Aggregation: Rethinking the
Aggregation of Deep Neural Networks [57.0502745301132]
We propose Micro-Dense Nets, a novel architecture with global residual learning and local micro-dense aggregations.
Our micro-dense block can be integrated with neural architecture search based models to boost their performance.
arXiv Detail & Related papers (2020-04-19T08:34:52Z) - Searching Central Difference Convolutional Networks for Face
Anti-Spoofing [68.77468465774267]
Face anti-spoofing (FAS) plays a vital role in face recognition systems.
Most state-of-the-art FAS methods rely on stacked convolutions and expert-designed network.
Here we propose a novel frame level FAS method based on Central Difference Convolution (CDC)
arXiv Detail & Related papers (2020-03-09T12:48:37Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.