Replay and Synthetic Speech Detection with Res2net Architecture
- URL: http://arxiv.org/abs/2010.15006v3
- Date: Sat, 13 Feb 2021 16:01:36 GMT
- Title: Replay and Synthetic Speech Detection with Res2net Architecture
- Authors: Xu Li, Na Li, Chao Weng, Xunying Liu, Dan Su, Dong Yu, Helen Meng
- Abstract summary: Existing approaches for replay and synthetic speech detection still lack generalizability to unseen spoofing attacks.
This work proposes to leverage a novel model structure, so-called Res2Net, to improve the anti-spoofing countermeasure's generalizability.
- Score: 85.20912636149552
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Existing approaches for replay and synthetic speech detection still lack
generalizability to unseen spoofing attacks. This work proposes to leverage a
novel model structure, so-called Res2Net, to improve the anti-spoofing
countermeasure's generalizability. Res2Net mainly modifies the ResNet block to
enable multiple feature scales. Specifically, it splits the feature maps within
one block into multiple channel groups and designs a residual-like connection
across different channel groups. Such connection increases the possible
receptive fields, resulting in multiple feature scales. This multiple scaling
mechanism significantly improves the countermeasure's generalizability to
unseen spoofing attacks. It also decreases the model size compared to
ResNet-based models. Experimental results show that the Res2Net model
consistently outperforms ResNet34 and ResNet50 by a large margin in both
physical access (PA) and logical access (LA) of the ASVspoof 2019 corpus.
Moreover, integration with the squeeze-and-excitation (SE) block can further
enhance performance. For feature engineering, we investigate the
generalizability of Res2Net combined with different acoustic features, and
observe that the constant-Q transform (CQT) achieves the most promising
performance in both PA and LA scenarios. Our best single system outperforms
other state-of-the-art single systems in both PA and LA of the ASVspoof 2019
corpus.
Related papers
- Synthetic Voice Detection and Audio Splicing Detection using
SE-Res2Net-Conformer Architecture [2.9805017559176883]
This paper extends the existing Res2Net by involving the recent Conformer block to further exploit the local patterns on acoustic features.
Experimental results on ASVspoof 2019 database show that the proposed SE-Res2Net-Conformer architecture is able to improve the spoofing countermeasures performance.
This paper also proposes to re-formulate the existing audio splicing detection problem.
arXiv Detail & Related papers (2022-10-07T14:30:13Z) - ConvNext Based Neural Network for Anti-Spoofing [6.047242590232868]
Automatic speaker verification (ASV) has been widely used in the real life for identity authentication.
With the rapid development of speech conversion, speech algorithms and the improvement of the quality of recording devices, ASV systems are vulnerable for spoof attacks.
arXiv Detail & Related papers (2022-09-14T05:53:37Z) - RMNet: Equivalently Removing Residual Connection from Networks [15.32653042487324]
We propose to remove the residual connection in a vanilla ResNet equivalently by a reserving and merging (RM) operation on ResBlock.
As a plug-in method, RM Operation basically has three advantages: 1) its implementation makes it naturally friendly for high ratio network pruning, 2) it helps break the depth limitation of RepVGG, and 3) it leads to better accuracy-speed trade-off network (RMNet) compared to ResNet and RepVGG.
arXiv Detail & Related papers (2021-11-01T04:07:45Z) - DS-Net++: Dynamic Weight Slicing for Efficient Inference in CNNs and
Transformers [105.74546828182834]
We show a hardware-efficient dynamic inference regime, named dynamic weight slicing, which adaptively slice a part of network parameters for inputs with diverse difficulty levels.
We present dynamic slimmable network (DS-Net) and dynamic slice-able network (DS-Net++) by input-dependently adjusting filter numbers of CNNs and multiple dimensions in both CNNs and transformers.
arXiv Detail & Related papers (2021-09-21T09:57:21Z) - Channel-wise Gated Res2Net: Towards Robust Detection of Synthetic Speech
Attacks [67.7648985513978]
Existing approaches for anti-spoofing in automatic speaker verification (ASV) still lack generalizability to unseen attacks.
We present a novel, channel-wise gated Res2Net (CG-Res2Net), which modifies Res2Net to enable a channel-wise gating mechanism.
arXiv Detail & Related papers (2021-07-19T12:27:40Z) - Dynamic Slimmable Network [105.74546828182834]
We develop a dynamic network slimming regime named Dynamic Slimmable Network (DS-Net)
Our DS-Net is empowered with the ability of dynamic inference by the proposed double-headed dynamic gate.
It consistently outperforms its static counterparts as well as state-of-the-art static and dynamic model compression methods.
arXiv Detail & Related papers (2021-03-24T15:25:20Z) - Manifold Regularized Dynamic Network Pruning [102.24146031250034]
This paper proposes a new paradigm that dynamically removes redundant filters by embedding the manifold information of all instances into the space of pruned networks.
The effectiveness of the proposed method is verified on several benchmarks, which shows better performance in terms of both accuracy and computational cost.
arXiv Detail & Related papers (2021-03-10T03:59:03Z) - BiO-Net: Learning Recurrent Bi-directional Connections for
Encoder-Decoder Architecture [82.64881585566825]
We present a novel Bi-directional O-shape network (BiO-Net) that reuses the building blocks in a recurrent manner without introducing any extra parameters.
Our method significantly outperforms the vanilla U-Net as well as other state-of-the-art methods.
arXiv Detail & Related papers (2020-07-01T05:07:49Z) - Multi-Task Siamese Neural Network for Improving Replay Attack Detection [13.379530865598408]
Replay attack detection systems built upon Residual Neural Networks (ResNet)s have yielded astonishing results on the public benchmark ASVspoof 2019 Physical Access challenge.
We analyse the effect of discriminative feature learning in a multi-task learning setting on the generalizability and discriminability of RA detection systems.
arXiv Detail & Related papers (2020-02-16T00:21:16Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.