Related papers: SpecRNet: Towards Faster and More Accessible Audio DeepFake Detection

SpecRNet: Towards Faster and More Accessible Audio DeepFake Detection

URL: http://arxiv.org/abs/2210.06105v1
Date: Wed, 12 Oct 2022 11:36:14 GMT
Title: SpecRNet: Towards Faster and More Accessible Audio DeepFake Detection
Authors: Piotr Kawa, Marcin Plata, Piotr Syga
Abstract summary: SpecRNet is a neural network architecture characterized by a quick inference time and low computational requirements. Our benchmark shows that SpecRNet, requiring up to about 40% less time to process an audio sample, provides performance comparable to LCNN architecture.
Score: 0.4511923587827302
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Audio DeepFakes are utterances generated with the use of deep neural networks. They are highly misleading and pose a threat due to use in fake news, impersonation, or extortion. In this work, we focus on increasing accessibility to the audio DeepFake detection methods by providing SpecRNet, a neural network architecture characterized by a quick inference time and low computational requirements. Our benchmark shows that SpecRNet, requiring up to about 40% less time to process an audio sample, provides performance comparable to LCNN architecture - one of the best audio DeepFake detection models. Such a method can not only be used by online multimedia services to verify a large bulk of content uploaded daily but also, thanks to its low requirements, by average citizens to evaluate materials on their devices. In addition, we provide benchmarks in three unique settings that confirm the correctness of our model. They reflect scenarios of low-resource datasets, detection on short utterances and limited attacks benchmark in which we take a closer look at the influence of particular attacks on given architectures.

Related papers

End-to-end Audio Deepfake Detection from RAW Waveforms: a RawNet-Based Approach with Cross-Dataset Evaluation [8.11594945165255]
We propose an end-to-end deep learning framework for audio deepfake detection that operates directly on raw waveforms. Our model, RawNetLite, is a lightweight convolutional-recurrent architecture designed to capture both spectral and temporal features without handcrafted preprocessing.
arXiv Detail & Related papers (2025-04-29T16:38:23Z)
Measuring the Robustness of Audio Deepfake Detectors [59.09338266364506]
This work systematically evaluates the robustness of 10 audio deepfake detection models against 16 common corruptions. Using both traditional deep learning models and state-of-the-art foundation models, we make four unique observations.
arXiv Detail & Related papers (2025-03-21T23:21:17Z)
DiMoDif: Discourse Modality-information Differentiation for Audio-visual Deepfake Detection and Localization [13.840950434728533]
We present a novel audio-visual deepfake detection framework. Based on the assumption that in real samples - in contrast to deepfakes - visual and audio signals coincide in terms of information. We use features from deep networks that specialize in video and audio speech recognition to spot frame-level cross-modal incongruities.
arXiv Detail & Related papers (2024-11-15T13:47:33Z)
SafeEar: Content Privacy-Preserving Audio Deepfake Detection [17.859275594843965]
We propose SafeEar, a novel framework that aims to detect deepfake audios without relying on accessing the speech content within. Our key idea is to devise a neural audio into a novel decoupling model that well separates the semantic and acoustic information from audio samples. In this way, no semantic content will be exposed to the detector.
arXiv Detail & Related papers (2024-09-14T02:45:09Z)
High Fidelity Neural Audio Compression [92.4812002532009]
We introduce a state-of-the-art real-time, high-fidelity, audio leveraging neural networks. It consists in a streaming encoder-decoder architecture with quantized latent space trained in an end-to-end fashion. We simplify and speed-up the training by using a single multiscale spectrogram adversary.
arXiv Detail & Related papers (2022-10-24T17:52:02Z)
Fully Automated End-to-End Fake Audio Detection [57.78459588263812]
This paper proposes a fully automated end-toend fake audio detection method. We first use wav2vec pre-trained model to obtain a high-level representation of the speech. For the network structure, we use a modified version of the differentiable architecture search (DARTS) named light-DARTS.
arXiv Detail & Related papers (2022-08-20T06:46:55Z)
A Study of Designing Compact Audio-Visual Wake Word Spotting System Based on Iterative Fine-Tuning in Neural Network Pruning [57.28467469709369]
We investigate on designing a compact audio-visual wake word spotting (WWS) system by utilizing visual information. We introduce a neural network pruning strategy via the lottery ticket hypothesis in an iterative fine-tuning manner (LTH-IF) The proposed audio-visual system achieves significant performance improvements over the single-modality (audio-only or video-only) system under different noisy conditions.
arXiv Detail & Related papers (2022-02-17T08:26:25Z)
Real-time Speaker counting in a cocktail party scenario using Attention-guided Convolutional Neural Network [60.99112031408449]
We propose a real-time, single-channel attention-guided Convolutional Neural Network (CNN) to estimate the number of active speakers in overlapping speech. The proposed system extracts higher-level information from the speech spectral content using a CNN model. Experiments on simulated overlapping speech using WSJ corpus show that the attention solution is shown to improve the performance by almost 3% absolute over conventional temporal average pooling.
arXiv Detail & Related papers (2021-10-30T19:24:57Z)
Adversarially robust deepfake media detection using fused convolutional neural network predictions [79.00202519223662]
Current deepfake detection systems struggle against unseen data. We employ three different deep Convolutional Neural Network (CNN) models to classify fake and real images extracted from videos. The proposed technique outperforms state-of-the-art models with 96.5% accuracy.
arXiv Detail & Related papers (2021-02-11T11:28:00Z)
Edge-Detect: Edge-centric Network Intrusion Detection using Deep Neural Network [0.0]
Edge nodes are crucial for detection against multitudes of cyber attacks on Internet-of-Things endpoints. We develop a novel light, fast and accurate 'Edge-Detect' model, which detects Denial of Service attack on edge nodes using DLM techniques.
arXiv Detail & Related papers (2021-02-03T04:24:34Z)
Deep Speaker Embeddings for Far-Field Speaker Recognition on Short Utterances [53.063441357826484]
Speaker recognition systems based on deep speaker embeddings have achieved significant performance in controlled conditions. Speaker verification on short utterances in uncontrolled noisy environment conditions is one of the most challenging and highly demanded tasks. This paper presents approaches aimed to achieve two goals: a) improve the quality of far-field speaker verification systems in the presence of environmental noise, reverberation and b) reduce the system qualitydegradation for short utterances.
arXiv Detail & Related papers (2020-02-14T13:34:33Z)
Leveraging End-to-End Speech Recognition with Neural Architecture Search [0.0]
We show that a large improvement in the accuracy of deep speech models can be achieved with effective Neural Architecture Optimization. Our method achieves test error of 7% Word Error Rate (WER) on the LibriSpeech corpus and 13% Phone Error Rate (PER) on the TIMIT corpus, on par with state-of-the-art results.
arXiv Detail & Related papers (2019-12-11T08:15:58Z)

This list is automatically generated from the titles and abstracts of the papers in this site.