Robust Audio Anti-Spoofing with Fusion-Reconstruction Learning on
Multi-Order Spectrograms
- URL: http://arxiv.org/abs/2308.09302v1
- Date: Fri, 18 Aug 2023 04:51:15 GMT
- Title: Robust Audio Anti-Spoofing with Fusion-Reconstruction Learning on
Multi-Order Spectrograms
- Authors: Penghui Wen, Kun Hu, Wenxi Yue, Sen Zhang, Wanlei Zhou, Zhiyong Wang
- Abstract summary: We propose a novel deep learning method with a spectral fusion-reconstruction strategy, namely S2pecNet, to utilise multi-order spectral patterns for robust audio anti-spoofing representations.
A reconstruction from the fused representation to the input spectrograms further reduces the potential fused information loss.
Our method achieved the state-of-the-art performance with an EER of 0.77% on a widely used dataset.
- Score: 19.514932118278523
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Robust audio anti-spoofing has been increasingly challenging due to the
recent advancements on deepfake techniques. While spectrograms have
demonstrated their capability for anti-spoofing, complementary information
presented in multi-order spectral patterns have not been well explored, which
limits their effectiveness for varying spoofing attacks. Therefore, we propose
a novel deep learning method with a spectral fusion-reconstruction strategy,
namely S2pecNet, to utilise multi-order spectral patterns for robust audio
anti-spoofing representations. Specifically, spectral patterns up to
second-order are fused in a coarse-to-fine manner and two branches are designed
for the fine-level fusion from the spectral and temporal contexts. A
reconstruction from the fused representation to the input spectrograms further
reduces the potential fused information loss. Our method achieved the
state-of-the-art performance with an EER of 0.77% on a widely used dataset:
ASVspoof2019 LA Challenge.
Related papers
- Deep Spectral Methods for Unsupervised Ultrasound Image Interpretation [53.37499744840018]
This paper proposes a novel unsupervised deep learning strategy tailored to ultrasound to obtain easily interpretable tissue separations.
We integrate key concepts from unsupervised deep spectral methods, which combine spectral graph theory with deep learning methods.
We utilize self-supervised transformer features for spectral clustering to generate meaningful segments based on ultrasound-specific metrics and shape and positional priors, ensuring semantic consistency across the dataset.
arXiv Detail & Related papers (2024-08-04T14:30:14Z) - SpectralMamba: Efficient Mamba for Hyperspectral Image Classification [39.18999103115206]
Recurrent neural networks and Transformers have dominated most applications in hyperspectral (HS) imaging.
We propose SpectralMamba -- a novel state space model incorporated efficient deep learning framework for HS image classification.
We show that SpectralMamba surprisingly creates promising win-wins from both performance and efficiency perspectives.
arXiv Detail & Related papers (2024-04-12T14:12:03Z) - DMSSN: Distilled Mixed Spectral-Spatial Network for Hyperspectral Salient Object Detection [12.823338405434244]
Hyperspectral salient object detection (HSOD) has exhibited remarkable promise across various applications.
Previous methods insufficiently harness the inherent distinctive attributes of hyperspectral images (HSIs) during the feature extraction process.
We propose Distilled Mixed Spectral-Spatial Network (DMSSN), comprising a Distilled Spectral-Spatial Transformer (MSST)
We have created a large-scale HSOD dataset, HSOD-BIT, to tackle the issue of data scarcity in this field.
arXiv Detail & Related papers (2024-03-31T14:04:57Z) - Spectrum-driven Mixed-frequency Network for Hyperspectral Salient Object
Detection [14.621504062838731]
We propose a novel approach that fully leverages the spectral characteristics by extracting two distinct frequency components from the spectrum.
The Spectral Saliency approximates the region of salient objects, while the Spectral Edge captures edge information of salient objects.
To effectively utilize this dual-frequency information, we introduce a novel lightweight Spectrum-driven Mixed-frequency Network (SMN)
arXiv Detail & Related papers (2023-12-02T08:05:45Z) - High-Fidelity Speech Synthesis with Minimal Supervision: All Using
Diffusion Models [56.00939852727501]
Minimally-supervised speech synthesis decouples TTS by combining two types of discrete speech representations.
Non-autoregressive framework enhances controllability, and duration diffusion model enables diversified prosodic expression.
arXiv Detail & Related papers (2023-09-27T09:27:03Z) - Frame-to-Utterance Convergence: A Spectra-Temporal Approach for Unified
Spoofing Detection [6.713879688002623]
Existing anti-spoofing methods often simulate specific attack types, such as synthetic or replay attacks.
Current unified solutions struggle to detect spoofing artifacts.
We present a spectra-temporal fusion leveraging frame-level and utterance-level coefficients.
arXiv Detail & Related papers (2023-09-18T14:54:42Z) - Deep Spectro-temporal Artifacts for Detecting Synthesized Speech [57.42110898920759]
This paper provides an overall assessment of track 1 (Low-quality Fake Audio Detection) and track 2 (Partially Fake Audio Detection)
In this paper, spectro-temporal artifacts were detected using raw temporal signals, spectral features, as well as deep embedding features.
We ranked 4th and 5th in track 1 and track 2, respectively.
arXiv Detail & Related papers (2022-10-11T08:31:30Z) - MST++: Multi-stage Spectral-wise Transformer for Efficient Spectral
Reconstruction [148.26195175240923]
We propose a novel Transformer-based method, Multi-stage Spectral-wise Transformer (MST++) for efficient spectral reconstruction.
In the NTIRE 2022 Spectral Reconstruction Challenge, our approach won the First place.
arXiv Detail & Related papers (2022-04-17T02:39:32Z) - AASIST: Audio Anti-Spoofing using Integrated Spectro-Temporal Graph
Attention Networks [45.2410605401286]
We seek to develop an efficient, single system that can detect a broad range of different spoofing attacks without score-level ensembles.
We propose a novel heterogeneous stacking graph attention layer which models artefacts spanning heterogeneous temporal and spectral domains.
Our approach, named AASIST, outperforms the current state-of-the-art by 20% relative.
arXiv Detail & Related papers (2021-10-04T05:48:25Z) - Multi-Discriminator Sobolev Defense-GAN Against Adversarial Attacks for
End-to-End Speech Systems [78.5097679815944]
This paper introduces a defense approach against end-to-end adversarial attacks developed for cutting-edge speech-to-text systems.
First, we represent speech signals with 2D spectrograms using the short-time Fourier transform.
Second, we iteratively find a safe vector using a spectrogram subspace projection operation.
Third, we synthesize a spectrogram with such a safe vector using a novel GAN architecture trained with Sobolev integral probability metric.
arXiv Detail & Related papers (2021-03-15T01:11:13Z) - Spectral Analysis Network for Deep Representation Learning and Image
Clustering [53.415803942270685]
This paper proposes a new network structure for unsupervised deep representation learning based on spectral analysis.
It can identify the local similarities among images in patch level and thus more robust against occlusion.
It can learn more clustering-friendly representations and is capable to reveal the deep correlations among data samples.
arXiv Detail & Related papers (2020-09-11T05:07:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.