Sound Tagging in Infant-centric Home Soundscapes
- URL: http://arxiv.org/abs/2406.17190v1
- Date: Tue, 25 Jun 2024 00:15:54 GMT
- Title: Sound Tagging in Infant-centric Home Soundscapes
- Authors: Mohammad Nur Hossain Khan, Jialu Li, Nancy L. McElwain, Mark Hasegawa-Johnson, Bashima Islam,
- Abstract summary: We explore the performance of a large pre-trained model on infant-centric noise soundscapes in the home.
Our results show that fine-tuning the model by combining our collected dataset with public datasets increases the F1-score.
- Score: 30.76025173544015
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Certain environmental noises have been associated with negative developmental outcomes for infants and young children. Though classifying or tagging sound events in a domestic environment is an active research area, previous studies focused on data collected from a non-stationary microphone placed in the environment or from the perspective of adults. Further, many of these works ignore infants or young children in the environment or have data collected from only a single family where noise from the fixed sound source can be moderate at the infant's position or vice versa. Thus, despite the recent success of large pre-trained models for noise event detection, the performance of these models on infant-centric noise soundscapes in the home is yet to be explored. To bridge this gap, we have collected and labeled noises in home soundscapes from 22 families in an unobtrusive manner, where the data are collected through an infant-worn recording device. In this paper, we explore the performance of a large pre-trained model (Audio Spectrogram Transformer [AST]) on our noise-conditioned infant-centric environmental data as well as publicly available home environmental datasets. Utilizing different training strategies such as resampling, utilizing public datasets, mixing public and infant-centric training sets, and data augmentation using noise and masking, we evaluate the performance of a large pre-trained model on sparse and imbalanced infant-centric data. Our results show that fine-tuning the large pre-trained model by combining our collected dataset with public datasets increases the F1-score from 0.11 (public datasets) and 0.76 (collected datasets) to 0.84 (combined datasets) and Cohen's Kappa from 0.013 (public datasets) and 0.77 (collected datasets) to 0.83 (combined datasets) compared to only training with public or collected datasets, respectively.
Related papers
- Learning with Noisy Foundation Models [95.50968225050012]
This paper is the first work to comprehensively understand and analyze the nature of noise in pre-training datasets.
We propose a tuning method (NMTune) to affine the feature space to mitigate the malignant effect of noise and improve generalization.
arXiv Detail & Related papers (2024-03-11T16:22:41Z) - Understanding and Mitigating the Label Noise in Pre-training on
Downstream Tasks [91.15120211190519]
This paper aims to understand the nature of noise in pre-training datasets and to mitigate its impact on downstream tasks.
We propose a light-weight black-box tuning method (NMTune) to affine the feature space to mitigate the malignant effect of noise.
arXiv Detail & Related papers (2023-09-29T06:18:15Z) - Improving the Robustness of Summarization Models by Detecting and
Removing Input Noise [50.27105057899601]
We present a large empirical study quantifying the sometimes severe loss in performance from different types of input noise for a range of datasets and model sizes.
We propose a light-weight method for detecting and removing such noise in the input during model inference without requiring any training, auxiliary models, or even prior knowledge of the type of noise.
arXiv Detail & Related papers (2022-12-20T00:33:11Z) - NASTAR: Noise Adaptive Speech Enhancement with Target-Conditional
Resampling [34.565077865854484]
We propose noise adaptive speech enhancement with target-conditional resampling (NASTAR)
NASTAR uses a feedback mechanism to simulate adaptive training data via a noise extractor and a retrieval model.
Experimental results show that NASTAR can effectively use one noisy speech sample to adapt an SE model to a target condition.
arXiv Detail & Related papers (2022-06-18T00:15:48Z) - Learning with Noisy Labels Revisited: A Study Using Real-World Human
Annotations [54.400167806154535]
Existing research on learning with noisy labels mainly focuses on synthetic label noise.
This work presents two new benchmark datasets (CIFAR-10N, CIFAR-100N)
We show that real-world noisy labels follow an instance-dependent pattern rather than the classically adopted class-dependent ones.
arXiv Detail & Related papers (2021-10-22T22:42:11Z) - The potential of self-supervised networks for random noise suppression
in seismic data [0.0]
Blind-spot networks are shown to be an efficient suppressor of random noise in seismic data.
Results are compared with two commonly used random denoising techniques: FX-deconvolution and Curvelet transform.
We believe this is just the beginning of utilising self-supervised learning in seismic applications.
arXiv Detail & Related papers (2021-09-15T14:57:43Z) - Training Classifiers that are Universally Robust to All Label Noise
Levels [91.13870793906968]
Deep neural networks are prone to overfitting in the presence of label noise.
We propose a distillation-based framework that incorporates a new subcategory of Positive-Unlabeled learning.
Our framework generally outperforms at medium to high noise levels.
arXiv Detail & Related papers (2021-05-27T13:49:31Z) - Generation and Analysis of Feature-Dependent Pseudo Noise for Training
Deep Neural Networks [0.0]
Training Deep neural networks (DNNs) on noisy labeled datasets is a challenging problem, because learning on mislabeled examples deteriorates the performance of the network.
We propose an intuitive approach to creating feature-dependent noisy datasets by utilizing the training predictions of DNNs on clean datasets that also retain true label information.
We conduct several experiments to establish that Pseudo noisy datasets resemble feature-dependent noisy datasets across different conditions.
arXiv Detail & Related papers (2021-05-22T19:15:26Z) - DASEE A Synthetic Database of Domestic Acoustic Scenes and Events in
Dementia Patients Environment [0.0]
We generate an unbiased synthetic domestic audio database, consisting of sound scenes and events, emulated in both quiet and noisy environments.
Data is carefully curated such that it reflects issues commonly faced in a dementia patients environment.
We present an 11-class database containing excerpts of clean and noisy signals at 5-seconds duration each, uniformly sampled at 16 kHz.
arXiv Detail & Related papers (2021-04-27T18:51:44Z) - Adaptive Multi-View ICA: Estimation of noise levels for optimal
inference [65.94843987207445]
Adaptive multiView ICA (AVICA) is a noisy ICA model where each view is a linear mixture of shared independent sources with additive noise on the sources.
On synthetic data, AVICA yields better sources estimates than other group ICA methods thanks to its explicit MMSE estimator.
On real magnetoencephalograpy (MEG) data, we provide evidence that the decomposition is less sensitive to sampling noise and that the noise variance estimates are biologically plausible.
arXiv Detail & Related papers (2021-02-22T13:10:12Z) - Infant Crying Detection in Real-World Environments [0.0]
We evaluate several established machine learning approaches including a model leveraging both deep spectrum and acoustic features.
We collect and annotate a novel dataset of infant crying compiled from over 780 hours of labeled real-world audio data.
Our findings confirm that a cry detection model trained on in-lab data underperforms when presented with real-world data.
arXiv Detail & Related papers (2020-05-12T18:25:44Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.