Single-channel speech enhancement using learnable loss mixup
- URL: http://arxiv.org/abs/2312.17255v1
- Date: Wed, 20 Dec 2023 00:25:55 GMT
- Title: Single-channel speech enhancement using learnable loss mixup
- Authors: Oscar Chang, Dung N. Tran, Kazuhito Koishida
- Abstract summary: Generalization remains a major problem in supervised learning of single-channel speech enhancement.
We propose learnable loss mixup (LLM), a simple and effortless training diagram, to improve the generalization of deep learning-based speech enhancement models.
Our experimental results on the VCTK benchmark show that learnable loss mixup 3.26 PESQ, achieves outperforming the state-of-the-art.
- Score: 23.434378634735676
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Generalization remains a major problem in supervised learning of
single-channel speech enhancement. In this work, we propose learnable loss
mixup (LLM), a simple and effortless training diagram, to improve the
generalization of deep learning-based speech enhancement models. Loss mixup, of
which learnable loss mixup is a special variant, optimizes a mixture of the
loss functions of random sample pairs to train a model on virtual training data
constructed from these pairs of samples. In learnable loss mixup, by
conditioning on the mixed data, the loss functions are mixed using a non-linear
mixing function automatically learned via neural parameterization. Our
experimental results on the VCTK benchmark show that learnable loss mixup
achieves 3.26 PESQ, outperforming the state-of-the-art.
Related papers
- TiMix: Text-aware Image Mixing for Effective Vision-Language
Pre-training [42.142924806184425]
Mixed data samples for cross-modal contrastive learning implicitly serve as a regularizer for the contrastive loss.
TiMix exhibits a comparable performance on downstream tasks, even with a reduced amount of training data and shorter training time, when benchmarked against existing methods.
arXiv Detail & Related papers (2023-12-14T12:02:24Z) - The Benefits of Mixup for Feature Learning [117.93273337740442]
We first show that Mixup using different linear parameters for features and labels can still achieve similar performance to standard Mixup.
We consider a feature-noise data model and show that Mixup training can effectively learn the rare features from its mixture with the common features.
In contrast, standard training can only learn the common features but fails to learn the rare features, thus suffering from bad performance.
arXiv Detail & Related papers (2023-03-15T08:11:47Z) - Over-training with Mixup May Hurt Generalization [32.64382185990981]
We report a previously unobserved phenomenon in Mixup training.
On a number of standard datasets, the performance of Mixup-trained models starts to decay after training for a large number of epochs.
We show theoretically that Mixup training may introduce undesired data-dependent label noises to the synthesized data.
arXiv Detail & Related papers (2023-03-02T18:37:34Z) - Harnessing Hard Mixed Samples with Decoupled Regularizer [69.98746081734441]
Mixup is an efficient data augmentation approach that improves the generalization of neural networks by smoothing the decision boundary with mixed data.
In this paper, we propose an efficient mixup objective function with a decoupled regularizer named Decoupled Mixup (DM)
DM can adaptively utilize hard mixed samples to mine discriminative features without losing the original smoothness of mixup.
arXiv Detail & Related papers (2022-03-21T07:12:18Z) - Contrastive-mixup learning for improved speaker verification [17.93491404662201]
This paper proposes a novel formulation of prototypical loss with mixup for speaker verification.
Mixup is a simple yet efficient data augmentation technique that fabricates a weighted combination of random data point and label pairs.
arXiv Detail & Related papers (2022-02-22T05:09:22Z) - Discretization and Re-synthesis: an alternative method to solve the
Cocktail Party Problem [65.25725367771075]
This study demonstrates, for the first time, that the synthesis-based approach can also perform well on this problem.
Specifically, we propose a novel speech separation/enhancement model based on the recognition of discrete symbols.
By utilizing the synthesis model with the input of discrete symbols, after the prediction of discrete symbol sequence, each target speech could be re-synthesized.
arXiv Detail & Related papers (2021-12-17T08:35:40Z) - SMILE: Self-Distilled MIxup for Efficient Transfer LEarning [42.59451803498095]
In this work, we propose SMILE - Self-Distilled Mixup for EffIcient Transfer LEarning.
With mixed images as inputs, SMILE regularizes the outputs of CNN feature extractors to learn from the mixed feature vectors of inputs.
The triple regularizer balances the mixup effects in both feature and label spaces while bounding the linearity in-between samples for pre-training tasks.
arXiv Detail & Related papers (2021-03-25T16:02:21Z) - Mixup-Transformer: Dynamic Data Augmentation for NLP Tasks [75.69896269357005]
Mixup is the latest data augmentation technique that linearly interpolates input examples and the corresponding labels.
In this paper, we explore how to apply mixup to natural language processing tasks.
We incorporate mixup to transformer-based pre-trained architecture, named "mixup-transformer", for a wide range of NLP tasks.
arXiv Detail & Related papers (2020-10-05T23:37:30Z) - Step-Ahead Error Feedback for Distributed Training with Compressed
Gradient [99.42912552638168]
We show that a new "gradient mismatch" problem is raised by the local error feedback in centralized distributed training.
We propose two novel techniques, 1) step ahead and 2) error averaging, with rigorous theoretical analysis.
arXiv Detail & Related papers (2020-08-13T11:21:07Z) - Deep F-measure Maximization for End-to-End Speech Understanding [52.36496114728355]
We propose a differentiable approximation to the F-measure and train the network with this objective using standard backpropagation.
We perform experiments on two standard fairness datasets, Adult, Communities and Crime, and also on speech-to-intent detection on the ATIS dataset and speech-to-image concept classification on the Speech-COCO dataset.
In all four of these tasks, F-measure results in improved micro-F1 scores, with absolute improvements of up to 8% absolute, as compared to models trained with the cross-entropy loss function.
arXiv Detail & Related papers (2020-08-08T03:02:27Z) - Unsupervised Sound Separation Using Mixture Invariant Training [38.0680944898427]
We show that MixIT can achieve competitive performance compared to supervised methods on speech separation.
In particular, we significantly improve reverberant speech separation performance by incorporating reverberant mixtures.
arXiv Detail & Related papers (2020-06-23T02:22:14Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.