Transfer Learning-Based Deep Residual Learning for Speech Recognition in Clean and Noisy Environments
- URL: http://arxiv.org/abs/2505.01632v1
- Date: Fri, 02 May 2025 23:42:27 GMT
- Title: Transfer Learning-Based Deep Residual Learning for Speech Recognition in Clean and Noisy Environments
- Authors: Noussaiba Djeffal, Djamel Addou, Hamza Kheddar, Sid Ahmed Selouani,
- Abstract summary: This paper introduces a novel framework that incorporates a robust neural feature into ASR systems in both clean and noisy environments.<n>The experimental results demonstrate a significant improvement in recognition accuracy compared to convolutional neural networks (CNN) and long short-term memory (LSTM) networks.
- Score: 2.1892046440619626
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Addressing the detrimental impact of non-stationary environmental noise on automatic speech recognition (ASR) has been a persistent and significant research focus. Despite advancements, this challenge continues to be a major concern. Recently, data-driven supervised approaches, such as deep neural networks, have emerged as promising alternatives to traditional unsupervised methods. With extensive training, these approaches have the potential to overcome the challenges posed by diverse real-life acoustic environments. In this light, this paper introduces a novel neural framework that incorporates a robust frontend into ASR systems in both clean and noisy environments. Utilizing the Aurora-2 speech database, the authors evaluate the effectiveness of an acoustic feature set for Mel-frequency, employing the approach of transfer learning based on Residual neural network (ResNet). The experimental results demonstrate a significant improvement in recognition accuracy compared to convolutional neural networks (CNN) and long short-term memory (LSTM) networks. They achieved accuracies of 98.94% in clean and 91.21% in noisy mode.
Related papers
- Noise-based reward-modulated learning [2.3125457626961263]
We propose a noise-based, biologically inspired learning rule for low-power and real-time applications.<n>Our approach combines directional derivative theory with Hebbian-like updates to enable efficient, gradient-free learning in reinforcement learning.<n>Its formulation relies on local information alone, making it compatible with implementations in neuromorphic hardware.
arXiv Detail & Related papers (2025-03-31T11:35:23Z) - Hopfield-Enhanced Deep Neural Networks for Artifact-Resilient Brain
State Decoding [0.0]
We propose a two-stage computational framework combining Hopfield Networks for artifact data preprocessing with Conal Neural Networks (CNNs) for classification of brain states in rat neural recordings under different levels of anesthesia.
Performance across various levels of data compression and noise intensities showed that our framework can effectively mitigate artifacts, allowing the model to reach parity with the clean-data CNN at lower noise levels.
arXiv Detail & Related papers (2023-11-06T15:08:13Z) - Spiking-LEAF: A Learnable Auditory front-end for Spiking Neural Networks [53.31894108974566]
Spiking-LEAF is a learnable auditory front-end meticulously designed for SNN-based speech processing.
On keyword spotting and speaker identification tasks, the proposed Spiking-LEAF outperforms both SOTA spiking auditory front-ends.
arXiv Detail & Related papers (2023-09-18T04:03:05Z) - Histogram Layer Time Delay Neural Networks for Passive Sonar
Classification [58.720142291102135]
A novel method combines a time delay neural network and histogram layer to incorporate statistical contexts for improved feature learning and underwater acoustic target classification.
The proposed method outperforms the baseline model, demonstrating the utility in incorporating statistical contexts for passive sonar target recognition.
arXiv Detail & Related papers (2023-07-25T19:47:26Z) - Training neural networks with structured noise improves classification and generalization [0.0]
We show how adding structure to noisy training data can substantially improve the algorithm performance.
We also prove that the so-called Hebbian Unlearning rule coincides with the training-with-noise algorithm when noise is maximal.
arXiv Detail & Related papers (2023-02-26T22:10:23Z) - Spiking Neural Networks for Frame-based and Event-based Single Object
Localization [26.51843464087218]
Spiking neural networks have shown much promise as an energy-efficient alternative to artificial neural networks.
We propose a spiking neural network approach for single object localization trained using surrogate gradient descent.
We compare our method with similar artificial neural networks and show that our model has competitive/better performance in accuracy, against various corruptions, and has lower energy consumption.
arXiv Detail & Related papers (2022-06-13T22:22:32Z) - Deep Impulse Responses: Estimating and Parameterizing Filters with Deep
Networks [76.830358429947]
Impulse response estimation in high noise and in-the-wild settings is a challenging problem.
We propose a novel framework for parameterizing and estimating impulse responses based on recent advances in neural representation learning.
arXiv Detail & Related papers (2022-02-07T18:57:23Z) - Robust Learning of Recurrent Neural Networks in Presence of Exogenous
Noise [22.690064709532873]
We propose a tractable robustness analysis for RNN models subject to input noise.
The robustness measure can be estimated efficiently using linearization techniques.
Our proposed methodology significantly improves robustness of recurrent neural networks.
arXiv Detail & Related papers (2021-05-03T16:45:05Z) - Improving noise robust automatic speech recognition with single-channel
time-domain enhancement network [100.1041336974175]
We show that a single-channel time-domain denoising approach can significantly improve ASR performance.
We show that single-channel noise reduction can still improve ASR performance.
arXiv Detail & Related papers (2020-03-09T09:36:31Z) - Learn2Perturb: an End-to-end Feature Perturbation Learning to Improve
Adversarial Robustness [79.47619798416194]
Learn2Perturb is an end-to-end feature perturbation learning approach for improving the adversarial robustness of deep neural networks.
Inspired by the Expectation-Maximization, an alternating back-propagation training algorithm is introduced to train the network and noise parameters consecutively.
arXiv Detail & Related papers (2020-03-02T18:27:35Z) - Deep Speaker Embeddings for Far-Field Speaker Recognition on Short
Utterances [53.063441357826484]
Speaker recognition systems based on deep speaker embeddings have achieved significant performance in controlled conditions.
Speaker verification on short utterances in uncontrolled noisy environment conditions is one of the most challenging and highly demanded tasks.
This paper presents approaches aimed to achieve two goals: a) improve the quality of far-field speaker verification systems in the presence of environmental noise, reverberation and b) reduce the system qualitydegradation for short utterances.
arXiv Detail & Related papers (2020-02-14T13:34:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.