Noise robust speech emotion recognition with signal-to-noise ratio
adapting speech enhancement
- URL: http://arxiv.org/abs/2309.01164v1
- Date: Sun, 3 Sep 2023 13:00:04 GMT
- Title: Noise robust speech emotion recognition with signal-to-noise ratio
adapting speech enhancement
- Authors: Yu-Wen Chen, Julia Hirschberg, Yu Tsao
- Abstract summary: Speech emotion recognition (SER) often experiences reduced performance due to background noise.
In this study, we propose a Noise Robust Speech Emotion Recognition system, NRSER.
- Score: 29.783878253410506
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Speech emotion recognition (SER) often experiences reduced performance due to
background noise. In addition, making a prediction on signals with only
background noise could undermine user trust in the system. In this study, we
propose a Noise Robust Speech Emotion Recognition system, NRSER. NRSER employs
speech enhancement (SE) to effectively reduce the noise in input signals. Then,
the signal-to-noise-ratio (SNR)-level detection structure and waveform
reconstitution strategy are introduced to reduce the negative impact of SE on
speech signals with no or little background noise. Our experimental results
show that NRSER can effectively improve the noise robustness of the SER system,
including preventing the system from making emotion recognition on signals
consisting solely of background noise. Moreover, the proposed SNR-level
detection structure can be used individually for tasks such as data selection.
Related papers
- Learnable Residual-based Latent Denoising in Semantic Communication [27.49223957484401]
ASemCom framework is proposed for robust image transmission over noisy channels.
By incorporating a learnable latent denoiser into the receiver, the received signals are preprocessed to remove the channel noise.
arXiv Detail & Related papers (2025-02-11T07:29:32Z) - TRNet: Two-level Refinement Network leveraging Speech Enhancement for Noise Robust Speech Emotion Recognition [29.756961194844717]
The proposed TRNet substantially promotes the robustness of the proposed system in both matched and unmatched noisy environments.
Results validate that the proposed system substantially promotes the robustness of the proposed system in both matched and unmatched noisy environments.
arXiv Detail & Related papers (2024-04-19T16:09:17Z) - Signal-noise separation using unsupervised reservoir computing [0.0]
This paper introduces a signal-noise separation method based on time series prediction.
We estimate the noise distribution from the difference between the original signal and reconstructed one.
The method is based on a machine learning approach and requires no prior knowledge of either the deterministic signal or the noise distribution.
arXiv Detail & Related papers (2024-04-07T08:31:35Z) - On the Effectiveness of ASR Representations in Real-world Noisy Speech Emotion Recognition [23.812838405442953]
We propose an efficient attempt to noisy speech emotion recognition (NSER)
We adopt the automatic speech recognition (ASR) model as a noise-robust feature extractor to eliminate non-vocal information in noisy speech.
Our experimental results show that 1) the proposed method achieves better NSER performance compared with the conventional noise reduction method, 2) outperforms self-supervised learning approaches, and 3) even outperforms text-based approaches using ASR transcription or the ground truth transcription of noisy speech.
arXiv Detail & Related papers (2023-11-13T05:45:55Z) - Inference and Denoise: Causal Inference-based Neural Speech Enhancement [83.4641575757706]
This study addresses the speech enhancement (SE) task within the causal inference paradigm by modeling the noise presence as an intervention.
The proposed causal inference-based speech enhancement (CISE) separates clean and noisy frames in an intervened noisy speech using a noise detector and assigns both sets of frames to two mask-based enhancement modules (EMs) to perform noise-conditional SE.
arXiv Detail & Related papers (2022-11-02T15:03:50Z) - SAR Despeckling using a Denoising Diffusion Probabilistic Model [52.25981472415249]
The presence of speckle degrades the image quality and adversely affects the performance of SAR image understanding applications.
We introduce SAR-DDPM, a denoising diffusion probabilistic model for SAR despeckling.
The proposed method achieves significant improvements in both quantitative and qualitative results over the state-of-the-art despeckling methods.
arXiv Detail & Related papers (2022-06-09T14:00:26Z) - Zero-shot Blind Image Denoising via Implicit Neural Representations [77.79032012459243]
We propose an alternative denoising strategy that leverages the architectural inductive bias of implicit neural representations (INRs)
We show that our method outperforms existing zero-shot denoising methods under an extensive set of low-noise or real-noise scenarios.
arXiv Detail & Related papers (2022-04-05T12:46:36Z) - Improving Noise Robustness of Contrastive Speech Representation Learning
with Speech Reconstruction [109.44933866397123]
Noise robustness is essential for deploying automatic speech recognition systems in real-world environments.
We employ a noise-robust representation learned by a refined self-supervised framework for noisy speech recognition.
We achieve comparable performance to the best supervised approach reported with only 16% of labeled data.
arXiv Detail & Related papers (2021-10-28T20:39:02Z) - Removing Noise from Extracellular Neural Recordings Using Fully
Convolutional Denoising Autoencoders [62.997667081978825]
We propose a Fully Convolutional Denoising Autoencoder, which learns to produce a clean neuronal activity signal from a noisy multichannel input.
The experimental results on simulated data show that our proposed method can improve significantly the quality of noise-corrupted neural signals.
arXiv Detail & Related papers (2021-09-18T14:51:24Z) - CITISEN: A Deep Learning-Based Speech Signal-Processing Mobile
Application [63.2243126704342]
This study presents a deep learning-based speech signal-processing mobile application known as CITISEN.
The CITISEN provides three functions: speech enhancement (SE), model adaptation (MA), and background noise conversion (BNC)
Compared with the noisy speech signals, the enhanced speech signals achieved about 6% and 33% of improvements.
arXiv Detail & Related papers (2020-08-21T02:04:12Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.