A Novel Speech Intelligibility Enhancement Model based on
CanonicalCorrelation and Deep Learning
- URL: http://arxiv.org/abs/2202.05756v1
- Date: Fri, 11 Feb 2022 16:48:41 GMT
- Title: A Novel Speech Intelligibility Enhancement Model based on
CanonicalCorrelation and Deep Learning
- Authors: Tassadaq Hussain, Muhammad Diyan, Mandar Gogate, Kia Dashtipour, Ahsan
Adeel, Yu Tsao, Amir Hussain
- Abstract summary: We present a canonical correlation based short-time objective intelligibility (CC-STOI) cost function to train a fully convolutional neural network (FCN) model.
We show that our CC-STOI based speech enhancement framework outperforms state-of-the-art DL models trained with conventional distance-based and STOI-based loss functions.
- Score: 12.913738983870621
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Current deep learning (DL) based approaches to speech intelligibility
enhancement in noisy environments are often trained to minimise the feature
distance between noise-free speech and enhanced speech signals. Despite
improving the speech quality, such approaches do not deliver required levels of
speech intelligibility in everyday noisy environments .
Intelligibility-oriented (I-O) loss functions have recently been developed to
train DL approaches for robust speech enhancement. Here, we formulate, for the
first time, a novel canonical correlation based I-O loss function to more
effectively train DL algorithms. Specifically, we present a
canonical-correlation based short-time objective intelligibility (CC-STOI) cost
function to train a fully convolutional neural network (FCN) model. We carry
out comparative simulation experiments to show that our CC-STOI based speech
enhancement framework outperforms state-of-the-art DL models trained with
conventional distance-based and STOI-based loss functions, using objective and
subjective evaluation measures for case of both unseen speakers and noises.
Ongoing future work is evaluating the proposed approach for design of robust
hearing-assistive technology.
Related papers
- Pre-training Feature Guided Diffusion Model for Speech Enhancement [37.88469730135598]
Speech enhancement significantly improves the clarity and intelligibility of speech in noisy environments.
We introduce a novel pretraining feature-guided diffusion model tailored for efficient speech enhancement.
arXiv Detail & Related papers (2024-06-11T18:22:59Z) - Contrastive and Consistency Learning for Neural Noisy-Channel Model in Spoken Language Understanding [1.07288078404291]
We propose a natural language understanding approach based on Automatic Speech Recognition (ASR)
We improve a noisy-channel model to handle transcription inconsistencies caused by ASR errors.
Experiments on four benchmark datasets show that Contrastive and Consistency Learning (CCL) outperforms existing methods.
arXiv Detail & Related papers (2024-05-23T23:10:23Z) - Entropy-Regularized Token-Level Policy Optimization for Language Agent Reinforcement [67.1393112206885]
Large Language Models (LLMs) have shown promise as intelligent agents in interactive decision-making tasks.
We introduce Entropy-Regularized Token-level Policy Optimization (ETPO), an entropy-augmented RL method tailored for optimizing LLMs at the token level.
We assess the effectiveness of ETPO within a simulated environment that models data science code generation as a series of multi-step interactive tasks.
arXiv Detail & Related papers (2024-02-09T07:45:26Z) - Scalable Learning of Latent Language Structure With Logical Offline
Cycle Consistency [71.42261918225773]
Conceptually, LOCCO can be viewed as a form of self-learning where the semantic being trained is used to generate annotations for unlabeled text.
As an added bonus, the annotations produced by LOCCO can be trivially repurposed to train a neural text generation model.
arXiv Detail & Related papers (2023-05-31T16:47:20Z) - Improving the Intent Classification accuracy in Noisy Environment [9.447108578893639]
In this paper, we investigate how environmental noise and related noise reduction techniques to address the intent classification task with end-to-end neural models.
For this task, the use of speech enhancement greatly improves the classification accuracy in noisy conditions.
arXiv Detail & Related papers (2023-03-12T06:11:44Z) - End-to-End Active Speaker Detection [58.7097258722291]
We propose an end-to-end training network where feature learning and contextual predictions are jointly learned.
We also introduce intertemporal graph neural network (iGNN) blocks, which split the message passing according to the main sources of context in the ASD problem.
Experiments show that the aggregated features from the iGNN blocks are more suitable for ASD, resulting in state-of-the art performance.
arXiv Detail & Related papers (2022-03-27T08:55:28Z) - Discretization and Re-synthesis: an alternative method to solve the
Cocktail Party Problem [65.25725367771075]
This study demonstrates, for the first time, that the synthesis-based approach can also perform well on this problem.
Specifically, we propose a novel speech separation/enhancement model based on the recognition of discrete symbols.
By utilizing the synthesis model with the input of discrete symbols, after the prediction of discrete symbol sequence, each target speech could be re-synthesized.
arXiv Detail & Related papers (2021-12-17T08:35:40Z) - Towards Intelligibility-Oriented Audio-Visual Speech Enhancement [8.19144665585397]
We present a fully convolutional AV SE model that uses a modified short-time objective intelligibility (STOI) metric as a training cost function.
Our proposed I-O AV SE framework outperforms audio-only (AO) and AV models trained with conventional distance-based loss functions.
arXiv Detail & Related papers (2021-11-18T11:47:37Z) - Improving Noise Robustness of Contrastive Speech Representation Learning
with Speech Reconstruction [109.44933866397123]
Noise robustness is essential for deploying automatic speech recognition systems in real-world environments.
We employ a noise-robust representation learned by a refined self-supervised framework for noisy speech recognition.
We achieve comparable performance to the best supervised approach reported with only 16% of labeled data.
arXiv Detail & Related papers (2021-10-28T20:39:02Z) - Time-domain Speech Enhancement with Generative Adversarial Learning [53.74228907273269]
This paper proposes a new framework called Time-domain Speech Enhancement Generative Adversarial Network (TSEGAN)
TSEGAN is an extension of the generative adversarial network (GAN) in time-domain with metric evaluation to mitigate the scaling problem.
In addition, we provide a new method based on objective function mapping for the theoretical analysis of the performance of Metric GAN.
arXiv Detail & Related papers (2021-03-30T08:09:49Z) - A Spectral Energy Distance for Parallel Speech Synthesis [29.14723501889278]
Speech synthesis is an important practical generative modeling problem.
We propose a new learning method that allows us to train highly parallel models of speech.
arXiv Detail & Related papers (2020-08-03T19:56:04Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.