Complex-valued neural networks for voice anti-spoofing
- URL: http://arxiv.org/abs/2308.11800v1
- Date: Tue, 22 Aug 2023 21:49:38 GMT
- Title: Complex-valued neural networks for voice anti-spoofing
- Authors: Nicolas M. M\"uller, Philip Sperl, Konstantin B\"ottinger
- Abstract summary: Current anti-spoofing and audio deepfake detection systems use either magnitude spectrogram-based features (such as CQT or Melspectrograms) or raw audio processed through convolution or sinc-layers.
This paper proposes a new approach that combines the benefits of both methods by using complex-valued neural networks to process the input audio.
Results show that this approach outperforms previous methods on the "In-the-Wild" anti-spoofing dataset and enables interpretation of the results through explainable AI.
- Score: 1.1510009152620668
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Current anti-spoofing and audio deepfake detection systems use either
magnitude spectrogram-based features (such as CQT or Melspectrograms) or raw
audio processed through convolution or sinc-layers. Both methods have
drawbacks: magnitude spectrograms discard phase information, which affects
audio naturalness, and raw-feature-based models cannot use traditional
explainable AI methods. This paper proposes a new approach that combines the
benefits of both methods by using complex-valued neural networks to process the
complex-valued, CQT frequency-domain representation of the input audio. This
method retains phase information and allows for explainable AI methods. Results
show that this approach outperforms previous methods on the "In-the-Wild"
anti-spoofing dataset and enables interpretation of the results through
explainable AI. Ablation studies confirm that the model has learned to use
phase information to detect voice spoofing.
Related papers
- Statistics-aware Audio-visual Deepfake Detector [11.671275975119089]
Methods in audio-visualfake detection mostly assess the synchronization between audio and visual features.
We propose a statistical feature loss to enhance the discrimination capability of the model.
Experiments on the DFDC and FakeAVCeleb datasets demonstrate the relevance of the proposed method.
arXiv Detail & Related papers (2024-07-16T12:15:41Z) - Histogram Layer Time Delay Neural Networks for Passive Sonar
Classification [58.720142291102135]
A novel method combines a time delay neural network and histogram layer to incorporate statistical contexts for improved feature learning and underwater acoustic target classification.
The proposed method outperforms the baseline model, demonstrating the utility in incorporating statistical contexts for passive sonar target recognition.
arXiv Detail & Related papers (2023-07-25T19:47:26Z) - Anomalous Sound Detection using Audio Representation with Machine ID
based Contrastive Learning Pretraining [52.191658157204856]
This paper uses contrastive learning to refine audio representations for each machine ID, rather than for each audio sample.
The proposed two-stage method uses contrastive learning to pretrain the audio representation model.
Experiments show that our method outperforms the state-of-the-art methods using contrastive learning or self-supervised classification.
arXiv Detail & Related papers (2023-04-07T11:08:31Z) - Streaming Audio-Visual Speech Recognition with Alignment Regularization [69.30185151873707]
We propose a streaming AV-ASR system based on a hybrid connectionist temporal classification ( CTC)/attention neural network architecture.
The proposed AV-ASR model achieves WERs of 2.0% and 2.6% on the Lip Reading Sentences 3 dataset in an offline and online setup.
arXiv Detail & Related papers (2022-11-03T20:20:47Z) - Neural ODEs with Irregular and Noisy Data [8.349349605334316]
We discuss a methodology to learn differential equation(s) using noisy and irregular sampled measurements.
In our methodology, the main innovation can be seen in the integration of deep neural networks with the neural ordinary differential equations (ODEs) approach.
The proposed framework to learn a model describing the vector field is highly effective under noisy measurements.
arXiv Detail & Related papers (2022-05-19T11:24:41Z) - Visualizing Classifier Adjacency Relations: A Case Study in Speaker
Verification and Voice Anti-Spoofing [72.4445825335561]
We propose a simple method to derive 2D representation from detection scores produced by an arbitrary set of binary classifiers.
Based upon rank correlations, our method facilitates a visual comparison of classifiers with arbitrary scores.
While the approach is fully versatile and can be applied to any detection task, we demonstrate the method using scores produced by automatic speaker verification and voice anti-spoofing systems.
arXiv Detail & Related papers (2021-06-11T13:03:33Z) - Diffusion-Based Representation Learning [65.55681678004038]
We augment the denoising score matching framework to enable representation learning without any supervised signal.
In contrast, the introduced diffusion-based representation learning relies on a new formulation of the denoising score matching objective.
Using the same approach, we propose to learn an infinite-dimensional latent code that achieves improvements of state-of-the-art models on semi-supervised image classification.
arXiv Detail & Related papers (2021-05-29T09:26:02Z) - StutterNet: Stuttering Detection Using Time Delay Neural Network [9.726119468893721]
This paper introduce StutterNet, a novel deep learning based stuttering detection system.
We use a time-delay neural network (TDNN) suitable for capturing contextual aspects of the disfluent utterances.
Our method achieves promising results and outperforms the state-of-the-art residual neural network based method.
arXiv Detail & Related papers (2021-05-12T11:36:01Z) - Anomalous Sound Detection with Machine Learning: A Systematic Review [0.0]
This article presents a Systematic Review (SR) about studies related to Anamolous Sound Detection using Machine Learning (ML) techniques.
The state of the art was addressed, collecting data sets, methods for extracting features in audio, ML models, and evaluation methods used for ASD.
arXiv Detail & Related papers (2021-02-15T19:57:03Z) - Data-Driven Symbol Detection via Model-Based Machine Learning [117.58188185409904]
We review a data-driven framework to symbol detection design which combines machine learning (ML) and model-based algorithms.
In this hybrid approach, well-known channel-model-based algorithms are augmented with ML-based algorithms to remove their channel-model-dependence.
Our results demonstrate that these techniques can yield near-optimal performance of model-based algorithms without knowing the exact channel input-output statistical relationship.
arXiv Detail & Related papers (2020-02-14T06:58:27Z) - AudioMNIST: Exploring Explainable Artificial Intelligence for Audio
Analysis on a Simple Benchmark [12.034688724153044]
This paper explores post-hoc explanations for deep neural networks in the audio domain.
We present a novel Open Source audio dataset consisting of 30,000 audio samples of English spoken digits.
We demonstrate the superior interpretability of audible explanations over visual ones in a human user study.
arXiv Detail & Related papers (2018-07-09T23:11:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.