Replay and Synthetic Speech Detection with Res2net Architecture
- URL: http://arxiv.org/abs/2010.15006v3
- Date: Sat, 13 Feb 2021 16:01:36 GMT
- Title: Replay and Synthetic Speech Detection with Res2net Architecture
- Authors: Xu Li, Na Li, Chao Weng, Xunying Liu, Dan Su, Dong Yu, Helen Meng
- Abstract summary: Existing approaches for replay and synthetic speech detection still lack generalizability to unseen spoofing attacks.
This work proposes to leverage a novel model structure, so-called Res2Net, to improve the anti-spoofing countermeasure's generalizability.
- Score: 85.20912636149552
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Existing approaches for replay and synthetic speech detection still lack
generalizability to unseen spoofing attacks. This work proposes to leverage a
novel model structure, so-called Res2Net, to improve the anti-spoofing
countermeasure's generalizability. Res2Net mainly modifies the ResNet block to
enable multiple feature scales. Specifically, it splits the feature maps within
one block into multiple channel groups and designs a residual-like connection
across different channel groups. Such connection increases the possible
receptive fields, resulting in multiple feature scales. This multiple scaling
mechanism significantly improves the countermeasure's generalizability to
unseen spoofing attacks. It also decreases the model size compared to
ResNet-based models. Experimental results show that the Res2Net model
consistently outperforms ResNet34 and ResNet50 by a large margin in both
physical access (PA) and logical access (LA) of the ASVspoof 2019 corpus.
Moreover, integration with the squeeze-and-excitation (SE) block can further
enhance performance. For feature engineering, we investigate the
generalizability of Res2Net combined with different acoustic features, and
observe that the constant-Q transform (CQT) achieves the most promising
performance in both PA and LA scenarios. Our best single system outperforms
other state-of-the-art single systems in both PA and LA of the ASVspoof 2019
corpus.
Related papers
- On the Adversarial Transferability of Generalized "Skip Connections" [83.71752155227888]
Skip connection is an essential ingredient for modern deep models to be deeper and more powerful.
We find that using more gradients from the skip connections rather than the residual modules during backpropagation allows one to craft adversarial examples with high transferability.
We conduct comprehensive transfer attacks against various models including ResNets, Transformers, Inceptions, Neural Architecture Search, and Large Language Models.
arXiv Detail & Related papers (2024-10-11T16:17:47Z) - Dynamic Pre-training: Towards Efficient and Scalable All-in-One Image Restoration [100.54419875604721]
All-in-one image restoration tackles different types of degradations with a unified model instead of having task-specific, non-generic models for each degradation.
We propose DyNet, a dynamic family of networks designed in an encoder-decoder style for all-in-one image restoration tasks.
Our DyNet can seamlessly switch between its bulkier and lightweight variants, thereby offering flexibility for efficient model deployment.
arXiv Detail & Related papers (2024-04-02T17:58:49Z) - Synthetic Voice Detection and Audio Splicing Detection using
SE-Res2Net-Conformer Architecture [2.9805017559176883]
This paper extends the existing Res2Net by involving the recent Conformer block to further exploit the local patterns on acoustic features.
Experimental results on ASVspoof 2019 database show that the proposed SE-Res2Net-Conformer architecture is able to improve the spoofing countermeasures performance.
This paper also proposes to re-formulate the existing audio splicing detection problem.
arXiv Detail & Related papers (2022-10-07T14:30:13Z) - ConvNext Based Neural Network for Anti-Spoofing [6.047242590232868]
Automatic speaker verification (ASV) has been widely used in the real life for identity authentication.
With the rapid development of speech conversion, speech algorithms and the improvement of the quality of recording devices, ASV systems are vulnerable for spoof attacks.
arXiv Detail & Related papers (2022-09-14T05:53:37Z) - RMNet: Equivalently Removing Residual Connection from Networks [15.32653042487324]
We propose to remove the residual connection in a vanilla ResNet equivalently by a reserving and merging (RM) operation on ResBlock.
As a plug-in method, RM Operation basically has three advantages: 1) its implementation makes it naturally friendly for high ratio network pruning, 2) it helps break the depth limitation of RepVGG, and 3) it leads to better accuracy-speed trade-off network (RMNet) compared to ResNet and RepVGG.
arXiv Detail & Related papers (2021-11-01T04:07:45Z) - Channel-wise Gated Res2Net: Towards Robust Detection of Synthetic Speech
Attacks [67.7648985513978]
Existing approaches for anti-spoofing in automatic speaker verification (ASV) still lack generalizability to unseen attacks.
We present a novel, channel-wise gated Res2Net (CG-Res2Net), which modifies Res2Net to enable a channel-wise gating mechanism.
arXiv Detail & Related papers (2021-07-19T12:27:40Z) - Manifold Regularized Dynamic Network Pruning [102.24146031250034]
This paper proposes a new paradigm that dynamically removes redundant filters by embedding the manifold information of all instances into the space of pruned networks.
The effectiveness of the proposed method is verified on several benchmarks, which shows better performance in terms of both accuracy and computational cost.
arXiv Detail & Related papers (2021-03-10T03:59:03Z) - BiO-Net: Learning Recurrent Bi-directional Connections for
Encoder-Decoder Architecture [82.64881585566825]
We present a novel Bi-directional O-shape network (BiO-Net) that reuses the building blocks in a recurrent manner without introducing any extra parameters.
Our method significantly outperforms the vanilla U-Net as well as other state-of-the-art methods.
arXiv Detail & Related papers (2020-07-01T05:07:49Z) - Multi-Task Siamese Neural Network for Improving Replay Attack Detection [13.379530865598408]
Replay attack detection systems built upon Residual Neural Networks (ResNet)s have yielded astonishing results on the public benchmark ASVspoof 2019 Physical Access challenge.
We analyse the effect of discriminative feature learning in a multi-task learning setting on the generalizability and discriminability of RA detection systems.
arXiv Detail & Related papers (2020-02-16T00:21:16Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.