Towards Robust Transcription: Exploring Noise Injection Strategies for Training Data Augmentation
- URL: http://arxiv.org/abs/2410.14122v1
- Date: Fri, 18 Oct 2024 02:31:36 GMT
- Title: Towards Robust Transcription: Exploring Noise Injection Strategies for Training Data Augmentation
- Authors: Yonghyun Kim, Alexander Lerch,
- Abstract summary: This study investigates the impact of white noise at various Signal-to-Noise Ratio (SNR) levels on state-of-the-art APT models.
We hope this research provides valuable insights as preliminary work toward developing transcription models that maintain consistent performance across a range of acoustic conditions.
- Score: 55.752737615873464
- License:
- Abstract: Recent advancements in Automatic Piano Transcription (APT) have significantly improved system performance, but the impact of noisy environments on the system performance remains largely unexplored. This study investigates the impact of white noise at various Signal-to-Noise Ratio (SNR) levels on state-of-the-art APT models and evaluates the performance of the Onsets and Frames model when trained on noise-augmented data. We hope this research provides valuable insights as preliminary work toward developing transcription models that maintain consistent performance across a range of acoustic conditions.
Related papers
- Improving Anomalous Sound Detection via Low-Rank Adaptation Fine-Tuning of Pre-Trained Audio Models [45.90037602677841]
This paper introduces a robust Anomalous Sound Detection (ASD) model that leverages audio pre-trained models.
We fine-tune these models using machine operation data, employing SpecAug as a data augmentation strategy.
Our experiments establish a new benchmark of 77.75% on the evaluation set, with a significant improvement of 6.48% compared with previous state-of-the-art (SOTA) models.
arXiv Detail & Related papers (2024-09-11T05:19:38Z) - BTS: Bridging Text and Sound Modalities for Metadata-Aided Respiratory Sound Classification [0.0]
We fine-tune a pretrained text-audio multimodal model using free-text descriptions derived from the sound samples' metadata.
Our method achieves state-of-the-art performance on the ICBHI dataset, surpassing the previous best result by a notable margin of 1.17%.
arXiv Detail & Related papers (2024-06-10T20:49:54Z) - Learning with Noisy Foundation Models [95.50968225050012]
This paper is the first work to comprehensively understand and analyze the nature of noise in pre-training datasets.
We propose a tuning method (NMTune) to affine the feature space to mitigate the malignant effect of noise and improve generalization.
arXiv Detail & Related papers (2024-03-11T16:22:41Z) - Towards Robust and Generalizable Training: An Empirical Study of Noisy
Slot Filling for Input Perturbations [38.766702041991046]
We introduce a noise robustness evaluation dataset named Noise-SF for slot filling task.
The proposed dataset contains five types of human-annotated noise.
We find that baseline models have poor performance in robustness evaluation.
arXiv Detail & Related papers (2023-10-05T12:59:57Z) - Understanding and Mitigating the Label Noise in Pre-training on
Downstream Tasks [91.15120211190519]
This paper aims to understand the nature of noise in pre-training datasets and to mitigate its impact on downstream tasks.
We propose a light-weight black-box tuning method (NMTune) to affine the feature space to mitigate the malignant effect of noise.
arXiv Detail & Related papers (2023-09-29T06:18:15Z) - Comparative Study on the Effects of Noise in ML-Based Anxiety Detection [0.0]
We study how noise impacts model performance and developing models that are robust to noisy, real-world conditions.
We compare the effect of various intensities of noise on machine learning models classifying levels of physiological arousal.
arXiv Detail & Related papers (2023-06-01T19:52:24Z) - Inference and Denoise: Causal Inference-based Neural Speech Enhancement [83.4641575757706]
This study addresses the speech enhancement (SE) task within the causal inference paradigm by modeling the noise presence as an intervention.
The proposed causal inference-based speech enhancement (CISE) separates clean and noisy frames in an intervened noisy speech using a noise detector and assigns both sets of frames to two mask-based enhancement modules (EMs) to perform noise-conditional SE.
arXiv Detail & Related papers (2022-11-02T15:03:50Z) - Improving Noise Robustness of Contrastive Speech Representation Learning
with Speech Reconstruction [109.44933866397123]
Noise robustness is essential for deploying automatic speech recognition systems in real-world environments.
We employ a noise-robust representation learned by a refined self-supervised framework for noisy speech recognition.
We achieve comparable performance to the best supervised approach reported with only 16% of labeled data.
arXiv Detail & Related papers (2021-10-28T20:39:02Z) - Behavior of Keyword Spotting Networks Under Noisy Conditions [1.5425424751424208]
Keywords spotting (KWS) is becoming a ubiquitous need with the advancement in artificial intelligence and smart devices.
Recent work in this field have focused on several different architectures to achieve good results on datasets with low to moderate noise.
We present an extensive comparison between state-of-the-art KWS networks under various noisy conditions.
arXiv Detail & Related papers (2021-09-15T10:02:34Z) - On Dynamic Noise Influence in Differentially Private Learning [102.6791870228147]
Private Gradient Descent (PGD) is a commonly used private learning framework, which noises based on the Differential protocol.
Recent studies show that emphdynamic privacy schedules can improve at the final iteration, yet yet theoreticals of the effectiveness of such schedules remain limited.
This paper provides comprehensive analysis of noise influence in dynamic privacy schedules to answer these critical questions.
arXiv Detail & Related papers (2021-01-19T02:04:00Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.