All-for-One and One-For-All: Deep learning-based feature fusion for
Synthetic Speech Detection
- URL: http://arxiv.org/abs/2307.15555v1
- Date: Fri, 28 Jul 2023 13:50:25 GMT
- Title: All-for-One and One-For-All: Deep learning-based feature fusion for
Synthetic Speech Detection
- Authors: Daniele Mari, Davide Salvi, Paolo Bestagini, and Simone Milani
- Abstract summary: Recent advances in deep learning and computer vision have made the synthesis and counterfeiting of multimedia content more accessible than ever.
In this paper, we consider three different feature sets proposed in the literature for the synthetic speech detection task and present a model that fuses them.
The system was tested on different scenarios and datasets to prove its robustness to anti-forensic attacks and its generalization capabilities.
- Score: 18.429817510387473
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Recent advances in deep learning and computer vision have made the synthesis
and counterfeiting of multimedia content more accessible than ever, leading to
possible threats and dangers from malicious users. In the audio field, we are
witnessing the growth of speech deepfake generation techniques, which solicit
the development of synthetic speech detection algorithms to counter possible
mischievous uses such as frauds or identity thefts. In this paper, we consider
three different feature sets proposed in the literature for the synthetic
speech detection task and present a model that fuses them, achieving overall
better performances with respect to the state-of-the-art solutions. The system
was tested on different scenarios and datasets to prove its robustness to
anti-forensic attacks and its generalization capabilities.
Related papers
- Audio Anti-Spoofing Detection: A Survey [7.3348524333159]
Deep learning has given rise to sophisticated algorithms capable of manipulating or creating multimedia fake content, known as Deepfake.
Audio anti-spoofing detection challenges have been organized to foster the development of anti-spoofing countermeasures.
This survey paper presents a comprehensive review of every component within the detection pipeline, including algorithm architectures, optimization techniques, application generalizability, evaluation metrics, performance comparisons, available datasets, and open-source availability.
arXiv Detail & Related papers (2024-04-22T06:52:12Z) - Towards General Visual-Linguistic Face Forgery Detection [95.73987327101143]
Deepfakes are realistic face manipulations that can pose serious threats to security, privacy, and trust.
Existing methods mostly treat this task as binary classification, which uses digital labels or mask signals to train the detection model.
We propose a novel paradigm named Visual-Linguistic Face Forgery Detection(VLFFD), which uses fine-grained sentence-level prompts as the annotation.
arXiv Detail & Related papers (2023-07-31T10:22:33Z) - NPVForensics: Jointing Non-critical Phonemes and Visemes for Deepfake
Detection [50.33525966541906]
Existing multimodal detection methods capture audio-visual inconsistencies to expose Deepfake videos.
We propose a novel Deepfake detection method to mine the correlation between Non-critical Phonemes and Visemes, termed NPVForensics.
Our model can be easily adapted to the downstream Deepfake datasets with fine-tuning.
arXiv Detail & Related papers (2023-06-12T06:06:05Z) - Exploiting Explainability to Design Adversarial Attacks and Evaluate
Attack Resilience in Hate-Speech Detection Models [0.47334880432883714]
We present an analysis of adversarial robustness exhibited by various hate-speech detection models.
We devise and execute targeted attacks on the text by leveraging the TextAttack tool.
This work paves the way for creating more robust and reliable hate-speech detection systems.
arXiv Detail & Related papers (2023-05-29T19:59:40Z) - G3Detector: General GPT-Generated Text Detector [26.47122201110071]
We introduce an unpretentious yet potent detection approach proficient in identifying synthetic text across a wide array of fields.
Our detector demonstrates outstanding performance uniformly across various model architectures and decoding strategies.
It also possesses the capability to identify text generated utilizing a potent detection-evasion technique.
arXiv Detail & Related papers (2023-05-22T03:35:00Z) - A Vector Quantized Approach for Text to Speech Synthesis on Real-World
Spontaneous Speech [94.64927912924087]
We train TTS systems using real-world speech from YouTube and podcasts.
Recent Text-to-Speech architecture is designed for multiple code generation and monotonic alignment.
We show thatRecent Text-to-Speech architecture outperforms existing TTS systems in several objective and subjective measures.
arXiv Detail & Related papers (2023-02-08T17:34:32Z) - Combining Automatic Speaker Verification and Prosody Analysis for
Synthetic Speech Detection [15.884911752869437]
We present a novel approach for synthetic speech detection, exploiting the combination of two high-level semantic properties of the human voice.
On one side, we focus on speaker identity cues and represent them as speaker embeddings extracted using a state-of-the-art method for the automatic speaker verification task.
On the other side, voice prosody, intended as variations in rhythm, pitch or accent in speech, is extracted through a specialized encoder.
arXiv Detail & Related papers (2022-10-31T11:03:03Z) - Deepfake audio detection by speaker verification [79.99653758293277]
We propose a new detection approach that leverages only the biometric characteristics of the speaker, with no reference to specific manipulations.
The proposed approach can be implemented based on off-the-shelf speaker verification tools.
We test several such solutions on three popular test sets, obtaining good performance, high generalization ability, and high robustness to audio impairment.
arXiv Detail & Related papers (2022-09-28T13:46:29Z) - Deep Learning for Hate Speech Detection: A Comparative Study [54.42226495344908]
We present here a large-scale empirical comparison of deep and shallow hate-speech detection methods.
Our goal is to illuminate progress in the area, and identify strengths and weaknesses in the current state-of-the-art.
In doing so we aim to provide guidance as to the use of hate-speech detection in practice, quantify the state-of-the-art, and identify future research directions.
arXiv Detail & Related papers (2022-02-19T03:48:20Z) - An Overview of Deep-Learning-Based Audio-Visual Speech Enhancement and
Separation [57.68765353264689]
Speech enhancement and speech separation are two related tasks.
Traditionally, these tasks have been tackled using signal processing and machine learning techniques.
Deep learning has been exploited to achieve strong performance.
arXiv Detail & Related papers (2020-08-21T17:24:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.