Related papers: Replay and Synthetic Speech Detection with Res2net Architecture

Replay and Synthetic Speech Detection with Res2net Architecture

URL: http://arxiv.org/abs/2010.15006v3
Date: Sat, 13 Feb 2021 16:01:36 GMT
Title: Replay and Synthetic Speech Detection with Res2net Architecture
Authors: Xu Li, Na Li, Chao Weng, Xunying Liu, Dan Su, Dong Yu, Helen Meng
Abstract summary: Existing approaches for replay and synthetic speech detection still lack generalizability to unseen spoofing attacks. This work proposes to leverage a novel model structure, so-called Res2Net, to improve the anti-spoofing countermeasure's generalizability.
Score: 85.20912636149552
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Existing approaches for replay and synthetic speech detection still lack generalizability to unseen spoofing attacks. This work proposes to leverage a novel model structure, so-called Res2Net, to improve the anti-spoofing countermeasure's generalizability. Res2Net mainly modifies the ResNet block to enable multiple feature scales. Specifically, it splits the feature maps within one block into multiple channel groups and designs a residual-like connection across different channel groups. Such connection increases the possible receptive fields, resulting in multiple feature scales. This multiple scaling mechanism significantly improves the countermeasure's generalizability to unseen spoofing attacks. It also decreases the model size compared to ResNet-based models. Experimental results show that the Res2Net model consistently outperforms ResNet34 and ResNet50 by a large margin in both physical access (PA) and logical access (LA) of the ASVspoof 2019 corpus. Moreover, integration with the squeeze-and-excitation (SE) block can further enhance performance. For feature engineering, we investigate the generalizability of Res2Net combined with different acoustic features, and observe that the constant-Q transform (CQT) achieves the most promising performance in both PA and LA scenarios. Our best single system outperforms other state-of-the-art single systems in both PA and LA of the ASVspoof 2019 corpus.

Related papers

Nes2Net: A Lightweight Nested Architecture for Foundation Model Driven Speech Anti-spoofing [56.53218228501566]
Nested Res2Net (Nes2Net) is a lightweight back-end architecture designed to directly process high-dimensional features without DR layers. We report a 22% performance improvement and an 87% back-end computational cost reduction over the state-of-the-art baseline.
arXiv Detail & Related papers (2025-04-08T04:11:28Z)
On the Adversarial Transferability of Generalized "Skip Connections" [83.71752155227888]
Skip connection is an essential ingredient for modern deep models to be deeper and more powerful. We find that using more gradients from the skip connections rather than the residual modules during backpropagation allows one to craft adversarial examples with high transferability. We conduct comprehensive transfer attacks against various models including ResNets, Transformers, Inceptions, Neural Architecture Search, and Large Language Models.
arXiv Detail & Related papers (2024-10-11T16:17:47Z)
Dynamic Pre-training: Towards Efficient and Scalable All-in-One Image Restoration [100.54419875604721]
All-in-one image restoration tackles different types of degradations with a unified model instead of having task-specific, non-generic models for each degradation. We propose DyNet, a dynamic family of networks designed in an encoder-decoder style for all-in-one image restoration tasks. Our DyNet can seamlessly switch between its bulkier and lightweight variants, thereby offering flexibility for efficient model deployment.
arXiv Detail & Related papers (2024-04-02T17:58:49Z)
Synthetic Voice Detection and Audio Splicing Detection using SE-Res2Net-Conformer Architecture [2.9805017559176883]
This paper extends the existing Res2Net by involving the recent Conformer block to further exploit the local patterns on acoustic features. Experimental results on ASVspoof 2019 database show that the proposed SE-Res2Net-Conformer architecture is able to improve the spoofing countermeasures performance. This paper also proposes to re-formulate the existing audio splicing detection problem.
arXiv Detail & Related papers (2022-10-07T14:30:13Z)
ConvNext Based Neural Network for Anti-Spoofing [6.047242590232868]
Automatic speaker verification (ASV) has been widely used in the real life for identity authentication. With the rapid development of speech conversion, speech algorithms and the improvement of the quality of recording devices, ASV systems are vulnerable for spoof attacks.
arXiv Detail & Related papers (2022-09-14T05:53:37Z)
RMNet: Equivalently Removing Residual Connection from Networks [15.32653042487324]
We propose to remove the residual connection in a vanilla ResNet equivalently by a reserving and merging (RM) operation on ResBlock. As a plug-in method, RM Operation basically has three advantages: 1) its implementation makes it naturally friendly for high ratio network pruning, 2) it helps break the depth limitation of RepVGG, and 3) it leads to better accuracy-speed trade-off network (RMNet) compared to ResNet and RepVGG.
arXiv Detail & Related papers (2021-11-01T04:07:45Z)
Channel-wise Gated Res2Net: Towards Robust Detection of Synthetic Speech Attacks [67.7648985513978]
Existing approaches for anti-spoofing in automatic speaker verification (ASV) still lack generalizability to unseen attacks. We present a novel, channel-wise gated Res2Net (CG-Res2Net), which modifies Res2Net to enable a channel-wise gating mechanism.
arXiv Detail & Related papers (2021-07-19T12:27:40Z)
Manifold Regularized Dynamic Network Pruning [102.24146031250034]
This paper proposes a new paradigm that dynamically removes redundant filters by embedding the manifold information of all instances into the space of pruned networks. The effectiveness of the proposed method is verified on several benchmarks, which shows better performance in terms of both accuracy and computational cost.
arXiv Detail & Related papers (2021-03-10T03:59:03Z)
BiO-Net: Learning Recurrent Bi-directional Connections for Encoder-Decoder Architecture [82.64881585566825]
We present a novel Bi-directional O-shape network (BiO-Net) that reuses the building blocks in a recurrent manner without introducing any extra parameters. Our method significantly outperforms the vanilla U-Net as well as other state-of-the-art methods.
arXiv Detail & Related papers (2020-07-01T05:07:49Z)
Multi-Task Siamese Neural Network for Improving Replay Attack Detection [13.379530865598408]
Replay attack detection systems built upon Residual Neural Networks (ResNet)s have yielded astonishing results on the public benchmark ASVspoof 2019 Physical Access challenge. We analyse the effect of discriminative feature learning in a multi-task learning setting on the generalizability and discriminability of RA detection systems.
arXiv Detail & Related papers (2020-02-16T00:21:16Z)

This list is automatically generated from the titles and abstracts of the papers in this site.