The Sound of Silence: Efficiency of First Digit Features in Synthetic
Audio Detection
- URL: http://arxiv.org/abs/2210.02746v1
- Date: Thu, 6 Oct 2022 08:31:21 GMT
- Title: The Sound of Silence: Efficiency of First Digit Features in Synthetic
Audio Detection
- Authors: Daniele Mari, Federica Latora, Simone Milani
- Abstract summary: This work investigates the discriminative role of silenced parts in synthetic speech detection.
It shows how first digit statistics extracted from MFCC coefficients can efficiently enable a robust detection.
The proposed procedure is computationally-lightweight and effective on many different algorithms.
- Score: 11.52842516726486
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The recent integration of generative neural strategies and audio processing
techniques have fostered the widespread of synthetic speech synthesis or
transformation algorithms. This capability proves to be harmful in many legal
and informative processes (news, biometric authentication, audio evidence in
courts, etc.). Thus, the development of efficient detection algorithms is both
crucial and challenging due to the heterogeneity of forgery techniques.
This work investigates the discriminative role of silenced parts in synthetic
speech detection and shows how first digit statistics extracted from MFCC
coefficients can efficiently enable a robust detection. The proposed procedure
is computationally-lightweight and effective on many different algorithms since
it does not rely on large neural detection architecture and obtains an accuracy
above 90\% in most of the classes of the ASVSpoof dataset.
Related papers
- Speech-Forensics: Towards Comprehensive Synthetic Speech Dataset Establishment and Analysis [21.245160899212774]
We propose the Speech-Forensics dataset by extensively covering authentic, synthetic, and partially forged speech samples.
We also propose a TEmporal Speech LocalizaTion network, called TEST, aiming at simultaneously performing authenticity detection, multiple fake segments localization, and algorithms recognition.
Our model achieves an average mAP of 83.55% and an EER of 5.25% at the utterance level.
arXiv Detail & Related papers (2024-12-12T07:48:17Z) - SONAR: A Synthetic AI-Audio Detection Framework and Benchmark [59.09338266364506]
SONAR is a synthetic AI-Audio Detection Framework and Benchmark.
It aims to provide a comprehensive evaluation for distinguishing cutting-edge AI-synthesized auditory content.
It is the first framework to uniformly benchmark AI-audio detection across both traditional and foundation model-based deepfake detection systems.
arXiv Detail & Related papers (2024-10-06T01:03:42Z) - What to Remember: Self-Adaptive Continual Learning for Audio Deepfake
Detection [53.063161380423715]
Existing detection models have shown remarkable success in discriminating known deepfake audio, but struggle when encountering new attack types.
We propose a continual learning approach called Radian Weight Modification (RWM) for audio deepfake detection.
arXiv Detail & Related papers (2023-12-15T09:52:17Z) - A Discrepancy Aware Framework for Robust Anomaly Detection [51.710249807397695]
We present a Discrepancy Aware Framework (DAF), which demonstrates robust performance consistently with simple and cheap strategies.
Our method leverages an appearance-agnostic cue to guide the decoder in identifying defects, thereby alleviating its reliance on synthetic appearance.
Under the simple synthesis strategies, it outperforms existing methods by a large margin. Furthermore, it also achieves the state-of-the-art localization performance.
arXiv Detail & Related papers (2023-10-11T15:21:40Z) - Syn-Att: Synthetic Speech Attribution via Semi-Supervised Unknown
Multi-Class Ensemble of CNNs [1.262949092134022]
Novel strategy is proposed to attribute a synthetic speech track to the generator that is used to synthesize it.
The proposed detector transforms the audio into log-mel spectrogram, extracts features using CNN, and classifies it between five known and unknown algorithms.
The method outperforms other top teams in accuracy by 12-13% on Eval 2 and 1-2% on Eval 1, in the IEEE SP Cup challenge at ICASSP 2022.
arXiv Detail & Related papers (2023-09-15T04:26:39Z) - Adaptive Fake Audio Detection with Low-Rank Model Squeezing [50.7916414913962]
Traditional approaches, such as finetuning, are computationally intensive and pose a risk of impairing the acquired knowledge of known fake audio types.
We introduce the concept of training low-rank adaptation matrices tailored specifically to the newly emerging fake audio types.
Our approach offers several advantages, including reduced storage memory requirements and lower equal error rates.
arXiv Detail & Related papers (2023-06-08T06:06:42Z) - Walking Noise: On Layer-Specific Robustness of Neural Architectures against Noisy Computations and Associated Characteristic Learning Dynamics [1.5184189132709105]
We discuss the implications of additive, multiplicative and mixed noise for different classification tasks and model architectures.
We propose a methodology called Walking Noise which injects layer-specific noise to measure the robustness.
We conclude with a discussion of the use of this methodology in practice, among others, discussing its use for tailored multi-execution in noisy environments.
arXiv Detail & Related papers (2022-12-20T17:09:08Z) - Using growth transform dynamical systems for spatio-temporal data
sonification [9.721342507747158]
Sonification, or encoding information in meaningful audio signatures, has several advantages in augmenting or replacing traditional visualization methods for human-in-the-loop decision-making.
This paper presents a novel framework for sonifying high-dimensional data using a complex growth transform dynamical system model.
Our algorithm takes as input the data and optimization parameters underlying the learning or prediction task and combines it with the psycho parameters defined by the user.
arXiv Detail & Related papers (2021-08-21T16:25:59Z) - Spotting adversarial samples for speaker verification by neural vocoders [102.1486475058963]
We adopt neural vocoders to spot adversarial samples for automatic speaker verification (ASV)
We find that the difference between the ASV scores for the original and re-synthesize audio is a good indicator for discrimination between genuine and adversarial samples.
Our codes will be made open-source for future works to do comparison.
arXiv Detail & Related papers (2021-07-01T08:58:16Z) - Generalizing Face Forgery Detection with High-frequency Features [63.33397573649408]
Current CNN-based detectors tend to overfit to method-specific color textures and thus fail to generalize.
We propose to utilize the high-frequency noises for face forgery detection.
The first is the multi-scale high-frequency feature extraction module that extracts high-frequency noises at multiple scales.
The second is the residual-guided spatial attention module that guides the low-level RGB feature extractor to concentrate more on forgery traces from a new perspective.
arXiv Detail & Related papers (2021-03-23T08:19:21Z) - Using deep learning to understand and mitigate the qubit noise
environment [0.0]
We propose to address the challenge of extracting accurate noise spectra from time-dynamics measurements on qubits.
We demonstrate a neural network based methodology that allows for extraction of the noise spectrum associated with any qubit surrounded by an arbitrary bath.
Our results can be applied to a wide range of qubit platforms and provide a framework for improving qubit performance.
arXiv Detail & Related papers (2020-05-03T17:13:14Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.