Metric-oriented Speech Enhancement using Diffusion Probabilistic Model
- URL: http://arxiv.org/abs/2302.11989v1
- Date: Thu, 23 Feb 2023 13:12:35 GMT
- Title: Metric-oriented Speech Enhancement using Diffusion Probabilistic Model
- Authors: Chen Chen, Yuchen Hu, Weiwei Weng, Eng Siong Chng
- Abstract summary: Deep neural network based speech enhancement technique focuses on learning a noisy-to-clean transformation supervised by paired training data.
The task-specific evaluation metric (e.g., PESQ) is usually non-differentiable and can not be directly constructed in the training criteria.
We propose a metric-oriented speech enhancement method (MOSE) which integrates a metric-oriented training strategy into its reverse process.
- Score: 23.84172431047342
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Deep neural network based speech enhancement technique focuses on learning a
noisy-to-clean transformation supervised by paired training data. However, the
task-specific evaluation metric (e.g., PESQ) is usually non-differentiable and
can not be directly constructed in the training criteria. This mismatch between
the training objective and evaluation metric likely results in sub-optimal
performance. To alleviate it, we propose a metric-oriented speech enhancement
method (MOSE), which leverages the recent advances in the diffusion
probabilistic model and integrates a metric-oriented training strategy into its
reverse process. Specifically, we design an actor-critic based framework that
considers the evaluation metric as a posterior reward, thus guiding the reverse
process to the metric-increasing direction. The experimental results
demonstrate that MOSE obviously benefits from metric-oriented training and
surpasses the generative baselines in terms of all evaluation metrics.
Related papers
- Deciphering Cross-Modal Alignment in Large Vision-Language Models with Modality Integration Rate [118.37653302885607]
We present the Modality Integration Rate (MIR), an effective, robust, and generalized metric to indicate the multi-modal pre-training quality of Large Vision Language Models (LVLMs)
MIR is indicative about training data selection, training strategy schedule, and model architecture design to get better pre-training results.
arXiv Detail & Related papers (2024-10-09T17:59:04Z) - Guardians of the Machine Translation Meta-Evaluation: Sentinel Metrics Fall In! [80.3129093617928]
Annually, at the Conference of Machine Translation (WMT), the Metrics Shared Task organizers conduct the meta-evaluation of Machine Translation (MT) metrics.
This work highlights two issues with the meta-evaluation framework currently employed in WMT, and assesses their impact on the metrics rankings.
We introduce the concept of sentinel metrics, which are designed explicitly to scrutinize the meta-evaluation process's accuracy, robustness, and fairness.
arXiv Detail & Related papers (2024-08-25T13:29:34Z) - Rethinking Classifier Re-Training in Long-Tailed Recognition: A Simple
Logits Retargeting Approach [102.0769560460338]
We develop a simple logits approach (LORT) without the requirement of prior knowledge of the number of samples per class.
Our method achieves state-of-the-art performance on various imbalanced datasets, including CIFAR100-LT, ImageNet-LT, and iNaturalist 2018.
arXiv Detail & Related papers (2024-03-01T03:27:08Z) - Exploring validation metrics for offline model-based optimisation with
diffusion models [50.404829846182764]
In model-based optimisation (MBO) we are interested in using machine learning to design candidates that maximise some measure of reward with respect to a black box function called the (ground truth) oracle.
While an approximation to the ground oracle can be trained and used in place of it during model validation to measure the mean reward over generated candidates, the evaluation is approximate and vulnerable to adversarial examples.
This is encapsulated under our proposed evaluation framework which is also designed to measure extrapolation.
arXiv Detail & Related papers (2022-11-19T16:57:37Z) - Data Augmentation through Expert-guided Symmetry Detection to Improve
Performance in Offline Reinforcement Learning [0.0]
offline estimation of the dynamical model of a Markov Decision Process (MDP) is a non-trivial task.
Recent works showed that an expert-guided pipeline relying on Density Estimation methods effectively detects this structure in deterministic environments.
We show that the former results lead to a performance improvement when solving the learned MDP and then applying the optimized policy in the real environment.
arXiv Detail & Related papers (2021-12-18T14:32:32Z) - MetricOpt: Learning to Optimize Black-Box Evaluation Metrics [21.608384691401238]
We study the problem of optimizing arbitrary non-differentiable task evaluation metrics such as misclassification rate and recall.
Our method, named MetricOpt, operates in a black-box setting where the computational details of the target metric are unknown.
We achieve this by learning a differentiable value function, which maps compact task-specific model parameters to metric observations.
arXiv Detail & Related papers (2021-04-21T16:50:01Z) - MetricGAN+: An Improved Version of MetricGAN for Speech Enhancement [37.3251779254894]
We propose a MetricGAN+ in which three training techniques incorporating domain-knowledge of speech processing are proposed.
With these techniques, experimental results on the VoiceBank-DEMAND dataset show that MetricGAN+ can increase PESQ score by 0.3 compared to the previous MetricGAN.
arXiv Detail & Related papers (2021-04-08T06:46:35Z) - ReMP: Rectified Metric Propagation for Few-Shot Learning [67.96021109377809]
A rectified metric space is learned to maintain the metric consistency from training to testing.
Numerous analyses indicate that a simple modification of the objective can yield substantial performance gains.
The proposed ReMP is effective and efficient, and outperforms the state of the arts on various standard few-shot learning datasets.
arXiv Detail & Related papers (2020-12-02T00:07:53Z) - On Learning Text Style Transfer with Direct Rewards [101.97136885111037]
Lack of parallel corpora makes it impossible to directly train supervised models for the text style transfer task.
We leverage semantic similarity metrics originally used for fine-tuning neural machine translation models.
Our model provides significant gains in both automatic and human evaluation over strong baselines.
arXiv Detail & Related papers (2020-10-24T04:30:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.