Improved RawNet with Feature Map Scaling for Text-independent Speaker
Verification using Raw Waveforms
- URL: http://arxiv.org/abs/2004.00526v2
- Date: Thu, 7 May 2020 04:45:41 GMT
- Title: Improved RawNet with Feature Map Scaling for Text-independent Speaker
Verification using Raw Waveforms
- Authors: Jee-weon Jung, Seung-bin Kim, Hye-jin Shim, Ju-ho Kim, and Ha-Jin Yu
- Abstract summary: We improve RawNet by scaling feature maps using various methods.
The best performing system reduces the equal error rate by half compared to the original RawNet.
- Score: 44.192033435682944
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recent advances in deep learning have facilitated the design of speaker
verification systems that directly input raw waveforms. For example, RawNet
extracts speaker embeddings from raw waveforms, which simplifies the process
pipeline and demonstrates competitive performance. In this study, we improve
RawNet by scaling feature maps using various methods. The proposed mechanism
utilizes a scale vector that adopts a sigmoid non-linear function. It refers to
a vector with dimensionality equal to the number of filters in a given feature
map. Using a scale vector, we propose to scale the feature map
multiplicatively, additively, or both. In addition, we investigate replacing
the first convolution layer with the sinc-convolution layer of SincNet.
Experiments performed on the VoxCeleb1 evaluation dataset demonstrate the
effectiveness of the proposed methods, and the best performing system reduces
the equal error rate by half compared to the original RawNet. Expanded
evaluation results obtained using the VoxCeleb1-E and VoxCeleb-H protocols
marginally outperform existing state-of-the-art systems.
Related papers
- WiNet: Wavelet-based Incremental Learning for Efficient Medical Image Registration [68.25711405944239]
Deep image registration has demonstrated exceptional accuracy and fast inference.
Recent advances have adopted either multiple cascades or pyramid architectures to estimate dense deformation fields in a coarse-to-fine manner.
We introduce a model-driven WiNet that incrementally estimates scale-wise wavelet coefficients for the displacement/velocity field across various scales.
arXiv Detail & Related papers (2024-07-18T11:51:01Z) - Rewarded meta-pruning: Meta Learning with Rewards for Channel Pruning [19.978542231976636]
This paper proposes a novel method to reduce the parameters and FLOPs for computational efficiency in deep learning models.
We introduce accuracy and efficiency coefficients to control the trade-off between the accuracy of the network and its computing efficiency.
arXiv Detail & Related papers (2023-01-26T12:32:01Z) - Simple Pooling Front-ends For Efficient Audio Classification [56.59107110017436]
We show that eliminating the temporal redundancy in the input audio features could be an effective approach for efficient audio classification.
We propose a family of simple pooling front-ends (SimPFs) which use simple non-parametric pooling operations to reduce the redundant information.
SimPFs can achieve a reduction in more than half of the number of floating point operations for off-the-shelf audio neural networks.
arXiv Detail & Related papers (2022-10-03T14:00:41Z) - Gaussian Vector: An Efficient Solution for Facial Landmark Detection [3.058685580689605]
This paper proposes a new solution, Gaussian Vector, to preserve the spatial information as well as reduce the output size and simplify the post-processing.
We evaluate our method on 300W, COFW, WFLW and JD-landmark.
arXiv Detail & Related papers (2020-10-03T10:15:41Z) - Learning Noise-Aware Encoder-Decoder from Noisy Labels by Alternating
Back-Propagation for Saliency Detection [54.98042023365694]
We propose a noise-aware encoder-decoder framework to disentangle a clean saliency predictor from noisy training examples.
The proposed model consists of two sub-models parameterized by neural networks.
arXiv Detail & Related papers (2020-07-23T18:47:36Z) - Ensemble Wrapper Subsampling for Deep Modulation Classification [70.91089216571035]
Subsampling of received wireless signals is important for relaxing hardware requirements as well as the computational cost of signal processing algorithms.
We propose a subsampling technique to facilitate the use of deep learning for automatic modulation classification in wireless communication systems.
arXiv Detail & Related papers (2020-05-10T06:11:13Z) - WoodFisher: Efficient Second-Order Approximation for Neural Network
Compression [35.45199662813043]
We develop a method to compute a faithful and efficient estimate of the inverse Hessian.
Our main application is to neural network compression.
We show how our method can be extended to take into account first-order information.
arXiv Detail & Related papers (2020-04-29T17:14:23Z) - Network Adjustment: Channel Search Guided by FLOPs Utilization Ratio [101.84651388520584]
This paper presents a new framework named network adjustment, which considers network accuracy as a function of FLOPs.
Experiments on standard image classification datasets and a wide range of base networks demonstrate the effectiveness of our approach.
arXiv Detail & Related papers (2020-04-06T15:51:00Z) - Speaker Recognition using SincNet and X-Vector Fusion [8.637110868126546]
We propose an innovative approach to perform speaker recognition by fusing two recently introduced deep neural networks (DNNs) namely - SincNet and X-Celeb1.
arXiv Detail & Related papers (2020-04-05T14:44:14Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.