AASIST: Audio Anti-Spoofing using Integrated Spectro-Temporal Graph
Attention Networks
- URL: http://arxiv.org/abs/2110.01200v1
- Date: Mon, 4 Oct 2021 05:48:25 GMT
- Title: AASIST: Audio Anti-Spoofing using Integrated Spectro-Temporal Graph
Attention Networks
- Authors: Jee-weon Jung, Hee-Soo Heo, Hemlata Tak, Hye-jin Shim, Joon Son Chung,
Bong-Jin Lee, Ha-Jin Yu, Nicholas Evans
- Abstract summary: We seek to develop an efficient, single system that can detect a broad range of different spoofing attacks without score-level ensembles.
We propose a novel heterogeneous stacking graph attention layer which models artefacts spanning heterogeneous temporal and spectral domains.
Our approach, named AASIST, outperforms the current state-of-the-art by 20% relative.
- Score: 45.2410605401286
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Artefacts that differentiate spoofed from bona-fide utterances can reside in
spectral or temporal domains. Their reliable detection usually depends upon
computationally demanding ensemble systems where each subsystem is tuned to
some specific artefacts. We seek to develop an efficient, single system that
can detect a broad range of different spoofing attacks without score-level
ensembles. We propose a novel heterogeneous stacking graph attention layer
which models artefacts spanning heterogeneous temporal and spectral domains
with a heterogeneous attention mechanism and a stack node. With a new max graph
operation that involves a competitive mechanism and an extended readout scheme,
our approach, named AASIST, outperforms the current state-of-the-art by 20%
relative. Even a lightweight variant, AASIST-L, with only 85K parameters,
outperforms all competing systems.
Related papers
- Exploring Diverse Representations for Open Set Recognition [51.39557024591446]
Open set recognition (OSR) requires the model to classify samples that belong to closed sets while rejecting unknown samples during test.
Currently, generative models often perform better than discriminative models in OSR.
We propose a new model, namely Multi-Expert Diverse Attention Fusion (MEDAF), that learns diverse representations in a discriminative way.
arXiv Detail & Related papers (2024-01-12T11:40:22Z) - DiffSpectralNet : Unveiling the Potential of Diffusion Models for
Hyperspectral Image Classification [6.521187080027966]
We propose a new network called DiffSpectralNet, which combines diffusion and transformer techniques.
First, we use an unsupervised learning framework based on the diffusion model to extract both high-level and low-level spectral-spatial features.
The diffusion method is capable of extracting diverse and meaningful spectral-spatial features, leading to improvement in HSI classification.
arXiv Detail & Related papers (2023-10-29T15:26:37Z) - Frame-to-Utterance Convergence: A Spectra-Temporal Approach for Unified
Spoofing Detection [6.713879688002623]
Existing anti-spoofing methods often simulate specific attack types, such as synthetic or replay attacks.
Current unified solutions struggle to detect spoofing artifacts.
We present a spectra-temporal fusion leveraging frame-level and utterance-level coefficients.
arXiv Detail & Related papers (2023-09-18T14:54:42Z) - Robust Audio Anti-Spoofing with Fusion-Reconstruction Learning on
Multi-Order Spectrograms [19.514932118278523]
We propose a novel deep learning method with a spectral fusion-reconstruction strategy, namely S2pecNet, to utilise multi-order spectral patterns for robust audio anti-spoofing representations.
A reconstruction from the fused representation to the input spectrograms further reduces the potential fused information loss.
Our method achieved the state-of-the-art performance with an EER of 0.77% on a widely used dataset.
arXiv Detail & Related papers (2023-08-18T04:51:15Z) - Histopathology Whole Slide Image Analysis with Heterogeneous Graph
Representation Learning [78.49090351193269]
We propose a novel graph-based framework to leverage the inter-relationships among different types of nuclei for WSI analysis.
Specifically, we formulate the WSI as a heterogeneous graph with "nucleus-type" attribute to each node and a semantic attribute similarity to each edge.
Our framework outperforms the state-of-the-art methods with considerable margins on various tasks.
arXiv Detail & Related papers (2023-07-09T14:43:40Z) - Spectral Cross-Domain Neural Network with Soft-adaptive Threshold
Spectral Enhancement [12.837935554250409]
We propose a novel deep learning model named Spectral Cross-domain neural network (SCDNN)
It simultaneously reveal the key information embedded in spectral and time domains inside the neural network.
The proposed SCDNN is tested with several classification tasks implemented on the public ECG databases textitPTB-XL and textitMIT-BIH.
arXiv Detail & Related papers (2023-01-10T14:23:43Z) - Deep Spectro-temporal Artifacts for Detecting Synthesized Speech [57.42110898920759]
This paper provides an overall assessment of track 1 (Low-quality Fake Audio Detection) and track 2 (Partially Fake Audio Detection)
In this paper, spectro-temporal artifacts were detected using raw temporal signals, spectral features, as well as deep embedding features.
We ranked 4th and 5th in track 1 and track 2, respectively.
arXiv Detail & Related papers (2022-10-11T08:31:30Z) - Deep Autoregressive Models with Spectral Attention [74.08846528440024]
We propose a forecasting architecture that combines deep autoregressive models with a Spectral Attention (SA) module.
By characterizing in the spectral domain the embedding of the time series as occurrences of a random process, our method can identify global trends and seasonality patterns.
Two spectral attention models, global and local to the time series, integrate this information within the forecast and perform spectral filtering to remove time series's noise.
arXiv Detail & Related papers (2021-07-13T11:08:47Z) - Robust and Interpretable Temporal Convolution Network for Event
Detection in Lung Sound Recordings [37.0780415938284]
We propose a lightweight, yet robust, and completely interpretable framework for lung sound event detection.
We use a multi-branch TCN architecture and exploit a novel fusion strategy to combine the resultant features from these branches.
Our analysis of different feature fusion strategies shows that the proposed feature concatenation method leads to better suppression of non-informative features.
arXiv Detail & Related papers (2021-06-30T06:36:22Z) - TadGAN: Time Series Anomaly Detection Using Generative Adversarial
Networks [73.01104041298031]
TadGAN is an unsupervised anomaly detection approach built on Generative Adversarial Networks (GANs)
To capture the temporal correlations of time series, we use LSTM Recurrent Neural Networks as base models for Generators and Critics.
To demonstrate the performance and generalizability of our approach, we test several anomaly scoring techniques and report the best-suited one.
arXiv Detail & Related papers (2020-09-16T15:52:04Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.