Frame-to-Utterance Convergence: A Spectra-Temporal Approach for Unified
Spoofing Detection
- URL: http://arxiv.org/abs/2309.09837v1
- Date: Mon, 18 Sep 2023 14:54:42 GMT
- Title: Frame-to-Utterance Convergence: A Spectra-Temporal Approach for Unified
Spoofing Detection
- Authors: Awais Khan, Khalid Mahmood Malik, Shah Nawaz
- Abstract summary: Existing anti-spoofing methods often simulate specific attack types, such as synthetic or replay attacks.
Current unified solutions struggle to detect spoofing artifacts.
We present a spectra-temporal fusion leveraging frame-level and utterance-level coefficients.
- Score: 6.713879688002623
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Voice spoofing attacks pose a significant threat to automated speaker
verification systems. Existing anti-spoofing methods often simulate specific
attack types, such as synthetic or replay attacks. However, in real-world
scenarios, the countermeasures are unaware of the generation schema of the
attack, necessitating a unified solution. Current unified solutions struggle to
detect spoofing artifacts, especially with recent spoofing mechanisms. For
instance, the spoofing algorithms inject spectral or temporal anomalies, which
are challenging to identify. To this end, we present a spectra-temporal fusion
leveraging frame-level and utterance-level coefficients. We introduce a novel
local spectral deviation coefficient (SDC) for frame-level inconsistencies and
employ a bi-LSTM-based network for sequential temporal coefficients (STC),
which capture utterance-level artifacts. Our spectra-temporal fusion strategy
combines these coefficients, and an auto-encoder generates spectra-temporal
deviated coefficients (STDC) to enhance robustness. Our proposed approach
addresses multiple spoofing categories, including synthetic, replay, and
partial deepfake attacks. Extensive evaluation on diverse datasets
(ASVspoof2019, ASVspoof2021, VSDC, partial spoofs, and in-the-wild deepfakes)
demonstrated its robustness for a wide range of voice applications.
Related papers
- Toward Improving Synthetic Audio Spoofing Detection Robustness via Meta-Learning and Disentangled Training With Adversarial Examples [33.445126880876415]
We propose a reliable and robust spoofing detection system to filter out spoofing attacks instead of having them reach the automatic speaker verification system.
A weighted additive angular margin loss is proposed to address the data imbalance issue, and different margins has been assigned to improve generalization to unseen spoofing attacks.
We craft adversarial examples by adding imperceptible perturbations to spoofing speech as a data augmentation strategy, then we use an auxiliary batch normalization to guarantee that corresponding normalization statistics are performed exclusively on the adversarial examples.
arXiv Detail & Related papers (2024-08-23T19:26:54Z) - AudioFool: Fast, Universal and synchronization-free Cross-Domain Attack
on Speech Recognition [0.9913418444556487]
We investigate the needed properties of robust attacks compatible with the Over-The-Air (OTA) model.
We design a method of generating attacks with arbitrary such desired properties.
We evaluate our method on standard keyword classification tasks and analyze it in OTA.
arXiv Detail & Related papers (2023-09-20T16:59:22Z) - Robust Audio Anti-Spoofing with Fusion-Reconstruction Learning on
Multi-Order Spectrograms [19.514932118278523]
We propose a novel deep learning method with a spectral fusion-reconstruction strategy, namely S2pecNet, to utilise multi-order spectral patterns for robust audio anti-spoofing representations.
A reconstruction from the fused representation to the input spectrograms further reduces the potential fused information loss.
Our method achieved the state-of-the-art performance with an EER of 0.77% on a widely used dataset.
arXiv Detail & Related papers (2023-08-18T04:51:15Z) - Spatial-Frequency Discriminability for Revealing Adversarial Perturbations [53.279716307171604]
Vulnerability of deep neural networks to adversarial perturbations has been widely perceived in the computer vision community.
Current algorithms typically detect adversarial patterns through discriminative decomposition for natural and adversarial data.
We propose a discriminative detector relying on a spatial-frequency Krawtchouk decomposition.
arXiv Detail & Related papers (2023-05-18T10:18:59Z) - Deep Spectro-temporal Artifacts for Detecting Synthesized Speech [57.42110898920759]
This paper provides an overall assessment of track 1 (Low-quality Fake Audio Detection) and track 2 (Partially Fake Audio Detection)
In this paper, spectro-temporal artifacts were detected using raw temporal signals, spectral features, as well as deep embedding features.
We ranked 4th and 5th in track 1 and track 2, respectively.
arXiv Detail & Related papers (2022-10-11T08:31:30Z) - Dual Spoof Disentanglement Generation for Face Anti-spoofing with Depth
Uncertainty Learning [54.15303628138665]
Face anti-spoofing (FAS) plays a vital role in preventing face recognition systems from presentation attacks.
Existing face anti-spoofing datasets lack diversity due to the insufficient identity and insignificant variance.
We propose Dual Spoof Disentanglement Generation framework to tackle this challenge by "anti-spoofing via generation"
arXiv Detail & Related papers (2021-12-01T15:36:59Z) - AASIST: Audio Anti-Spoofing using Integrated Spectro-Temporal Graph
Attention Networks [45.2410605401286]
We seek to develop an efficient, single system that can detect a broad range of different spoofing attacks without score-level ensembles.
We propose a novel heterogeneous stacking graph attention layer which models artefacts spanning heterogeneous temporal and spectral domains.
Our approach, named AASIST, outperforms the current state-of-the-art by 20% relative.
arXiv Detail & Related papers (2021-10-04T05:48:25Z) - Multi-Discriminator Sobolev Defense-GAN Against Adversarial Attacks for
End-to-End Speech Systems [78.5097679815944]
This paper introduces a defense approach against end-to-end adversarial attacks developed for cutting-edge speech-to-text systems.
First, we represent speech signals with 2D spectrograms using the short-time Fourier transform.
Second, we iteratively find a safe vector using a spectrogram subspace projection operation.
Third, we synthesize a spectrogram with such a safe vector using a novel GAN architecture trained with Sobolev integral probability metric.
arXiv Detail & Related papers (2021-03-15T01:11:13Z) - Class-Conditional Defense GAN Against End-to-End Speech Attacks [82.21746840893658]
We propose a novel approach against end-to-end adversarial attacks developed to fool advanced speech-to-text systems such as DeepSpeech and Lingvo.
Unlike conventional defense approaches, the proposed approach does not directly employ low-level transformations such as autoencoding a given input signal.
Our defense-GAN considerably outperforms conventional defense algorithms in terms of word error rate and sentence level recognition accuracy.
arXiv Detail & Related papers (2020-10-22T00:02:02Z) - Temporal Sparse Adversarial Attack on Sequence-based Gait Recognition [56.844587127848854]
We demonstrate that the state-of-the-art gait recognition model is vulnerable to such attacks.
We employ a generative adversarial network based architecture to semantically generate adversarial high-quality gait silhouettes or video frames.
The experimental results show that if only one-fortieth of the frames are attacked, the accuracy of the target model drops dramatically.
arXiv Detail & Related papers (2020-02-22T10:08:42Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.