Related papers: Effects of Dataset Sampling Rate for Noise Cancellation through Deep Learning

Effects of Dataset Sampling Rate for Noise Cancellation through Deep Learning

URL: http://arxiv.org/abs/2405.20884v1
Date: Thu, 30 May 2024 16:20:44 GMT
Title: Effects of Dataset Sampling Rate for Noise Cancellation through Deep Learning
Authors: Brandon Colelough, Andrew Zheng,
Abstract summary: This research explores the use of deep neural networks (DNNs) as a superior alternative to traditional noise cancellation techniques. The ConvTasNET network was trained on datasets such as WHAM!, LibriMix, and the MS-2023 DNS Challenge. Models trained at higher sampling rates (48kHz) provided much better evaluation metrics against Total Harmonic Distortion (THD) and Quality Prediction For Generative Neural Speech Codecs (WARP-Q) values.
Score: 1.024113475677323
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: Background: Active noise cancellation has been a subject of research for decades. Traditional techniques, like the Fast Fourier Transform, have limitations in certain scenarios. This research explores the use of deep neural networks (DNNs) as a superior alternative. Objective: The study aims to determine the effect sampling rate within training data has on lightweight, efficient DNNs that operate within the processing constraints of mobile devices. Methods: We chose the ConvTasNET network for its proven efficiency in speech separation and enhancement. ConvTasNET was trained on datasets such as WHAM!, LibriMix, and the MS-2023 DNS Challenge. The datasets were sampled at rates of 8kHz, 16kHz, and 48kHz to analyze the effect of sampling rate on noise cancellation efficiency and effectiveness. The model was tested on a core-i7 Intel processor from 2023, assessing the network's ability to produce clear audio while filtering out background noise. Results: Models trained at higher sampling rates (48kHz) provided much better evaluation metrics against Total Harmonic Distortion (THD) and Quality Prediction For Generative Neural Speech Codecs (WARP-Q) values, indicating improved audio quality. However, a trade-off was noted with the processing time being longer for higher sampling rates. Conclusions: The Conv-TasNET network, trained on datasets sampled at higher rates like 48kHz, offers a robust solution for mobile devices in achieving noise cancellation through speech separation and enhancement. Future work involves optimizing the model's efficiency further and testing on mobile devices.

Related papers

Contrastive and Transfer Learning for Effective Audio Fingerprinting through a Real-World Evaluation Protocol [1.8842532732272859]
Recent advances in song identification leverage deep neural networks to learn compact audio fingerprints directly from raw waveforms.<n>While these methods perform well under controlled conditions, their accuracy drops significantly in real-world scenarios where the audio is captured via mobile devices in noisy environments.<n>We generate three recordings of the same audio, each with increasing levels of noise, captured using a mobile device's microphone.<n>Our results reveal a substantial performance drop for two state-of-the-art CNN-based models under this protocol, compared to previously reported benchmarks.
arXiv Detail & Related papers (2025-07-08T15:13:26Z)
Learning from Noise: Enhancing DNNs for Event-Based Vision through Controlled Noise Injection [0.0]
Event data frequently suffers from considerable noise, negatively impacting the performance and robustness of deep learning models.<n>We propose a novel noise-injection training methodology designed to enhance the robustness against varying levels of event noise.<n>Our approach introduces controlled noise directly into the training data, enabling models to learn noise-resilient representations.
arXiv Detail & Related papers (2025-06-04T13:10:26Z)
Reduced Effectiveness of Kolmogorov-Arnold Networks on Functions with Noise [9.492965765929963]
Noise in a dataset can significantly degrade the performance of Kolmogorov-Arnold networks. We propose an oversampling technique combined with denoising to alleviate the impact of noise. We conclude that applying both oversampling and filtering strategies can reduce the detrimental effects of noise.
arXiv Detail & Related papers (2024-07-20T14:17:10Z)
A Real-Time Voice Activity Detection Based On Lightweight Neural [4.589472292598182]
Voice activity detection (VAD) is the task of detecting speech in an audio stream. Recent neural network-based VADs have alleviated the degradation of performance to some extent. We propose a lightweight and real-time neural network called MagicNet, which utilizes casual and depth separable 1-D convolutions and GRU.
arXiv Detail & Related papers (2024-05-27T03:31:16Z)
Understanding and Mitigating the Label Noise in Pre-training on Downstream Tasks [91.15120211190519]
This paper aims to understand the nature of noise in pre-training datasets and to mitigate its impact on downstream tasks. We propose a light-weight black-box tuning method (NMTune) to affine the feature space to mitigate the malignant effect of noise.
arXiv Detail & Related papers (2023-09-29T06:18:15Z)
DiffSED: Sound Event Detection with Denoising Diffusion [70.18051526555512]
We reformulate the SED problem by taking a generative learning perspective. Specifically, we aim to generate sound temporal boundaries from noisy proposals in a denoising diffusion process. During training, our model learns to reverse the noising process by converting noisy latent queries to the groundtruth versions.
arXiv Detail & Related papers (2023-08-14T17:29:41Z)
LEAN: Light and Efficient Audio Classification Network [1.5070398746522742]
We propose a lightweight on-device deep learning-based model for audio classification, LEAN. LEAN consists of a raw waveform-based temporal feature extractor called as Wave realignment and logmel-based Pretrained YAMNet. We show that using a combination of trainable wave encoder, Pretrained YAMNet along with cross attention-based temporal realignment, results in competitive performance on downstream audio classification tasks with lesser memory footprints.
arXiv Detail & Related papers (2023-05-22T04:45:04Z)
Improving the Robustness of Summarization Models by Detecting and Removing Input Noise [50.27105057899601]
We present a large empirical study quantifying the sometimes severe loss in performance from different types of input noise for a range of datasets and model sizes. We propose a light-weight method for detecting and removing such noise in the input during model inference without requiring any training, auxiliary models, or even prior knowledge of the type of noise.
arXiv Detail & Related papers (2022-12-20T00:33:11Z)
Removing Noise from Extracellular Neural Recordings Using Fully Convolutional Denoising Autoencoders [62.997667081978825]
We propose a Fully Convolutional Denoising Autoencoder, which learns to produce a clean neuronal activity signal from a noisy multichannel input. The experimental results on simulated data show that our proposed method can improve significantly the quality of noise-corrupted neural signals.
arXiv Detail & Related papers (2021-09-18T14:51:24Z)
Bridging the Gap Between Clean Data Training and Real-World Inference for Spoken Language Understanding [76.89426311082927]
Existing models are trained on clean data, which causes a textitgap between clean data training and real-world inference. We propose a method from the perspective of domain adaptation, by which both high- and low-quality samples are embedding into similar vector space. Experiments on the widely-used dataset, Snips, and large scale in-house dataset (10 million training examples) demonstrate that this method not only outperforms the baseline models on real-world (noisy) corpus but also enhances the robustness, that is, it produces high-quality results under a noisy environment.
arXiv Detail & Related papers (2021-04-13T17:54:33Z)
DEEPF0: End-To-End Fundamental Frequency Estimation for Music and Speech Signals [11.939409227407769]
We propose a novel pitch estimation technique called DeepF0. It leverages the available annotated data to directly learn from the raw audio in a data-driven manner.
arXiv Detail & Related papers (2021-02-11T23:11:22Z)
Deep Time Delay Neural Network for Speech Enhancement with Full Data Learning [60.20150317299749]
This paper proposes a deep time delay neural network (TDNN) for speech enhancement with full data learning. To make full use of the training data, we propose a full data learning method for speech enhancement.
arXiv Detail & Related papers (2020-11-11T06:32:37Z)
Exploring Quality and Generalizability in Parameterized Neural Audio Effects [0.0]
Deep neural networks have shown promise for music audio signal processing applications. Results to date have tended to be constrained by low sample rates, noise, narrow domains of signal types, and/or lack of parameterized controls. This work expands on prior research published on modeling nonlinear time-dependent signal processing effects.
arXiv Detail & Related papers (2020-06-10T00:52:08Z)

This list is automatically generated from the titles and abstracts of the papers in this site.