Infant Crying Detection in Real-World Environments
- URL: http://arxiv.org/abs/2005.07036v6
- Date: Wed, 16 Feb 2022 22:01:27 GMT
- Title: Infant Crying Detection in Real-World Environments
- Authors: Xuewen Yao, Megan Micheletti, Mckensey Johnson, Edison Thomaz, Kaya de
Barbaro
- Abstract summary: We evaluate several established machine learning approaches including a model leveraging both deep spectrum and acoustic features.
We collect and annotate a novel dataset of infant crying compiled from over 780 hours of labeled real-world audio data.
Our findings confirm that a cry detection model trained on in-lab data underperforms when presented with real-world data.
- Score: 0.0
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Most existing cry detection models have been tested with data collected in
controlled settings. Thus, the extent to which they generalize to noisy and
lived environments is unclear. In this paper, we evaluate several established
machine learning approaches including a model leveraging both deep spectrum and
acoustic features. This model was able to recognize crying events with F1 score
0.613 (Precision: 0.672, Recall: 0.552), showing improved external validity
over existing methods at cry detection in everyday real-world settings. As part
of our evaluation, we collect and annotate a novel dataset of infant crying
compiled from over 780 hours of labeled real-world audio data, captured via
recorders worn by infants in their homes, which we make publicly available. Our
findings confirm that a cry detection model trained on in-lab data
underperforms when presented with real-world data (in-lab test F1: 0.656,
real-world test F1: 0.236), highlighting the value of our new dataset and
model.
Related papers
- Textile Anomaly Detection: Evaluation of the State-of-the-Art for Automated Quality Inspection of Carpet [0.0]
State-of-the-art unsupervised detection models were evaluated for the purpose of automated anomaly inspection of wool carpets.
A custom dataset of four unique types of carpet textures was created to thoroughly test the models.
The metrics of importance in this study were accuracy in detecting anomalous areas, the number of false detections, and the inference times of each model for real-time performance.
arXiv Detail & Related papers (2024-07-26T01:13:59Z) - Sound Tagging in Infant-centric Home Soundscapes [30.76025173544015]
We explore the performance of a large pre-trained model on infant-centric noise soundscapes in the home.
Our results show that fine-tuning the model by combining our collected dataset with public datasets increases the F1-score.
arXiv Detail & Related papers (2024-06-25T00:15:54Z) - Real Acoustic Fields: An Audio-Visual Room Acoustics Dataset and Benchmark [65.79402756995084]
Real Acoustic Fields (RAF) is a new dataset that captures real acoustic room data from multiple modalities.
RAF is the first dataset to provide densely captured room acoustic data.
arXiv Detail & Related papers (2024-03-27T17:59:56Z) - Open World Object Detection in the Era of Foundation Models [53.683963161370585]
We introduce a new benchmark that includes five real-world application-driven datasets.
We introduce a novel method, Foundation Object detection Model for the Open world, or FOMO, which identifies unknown objects based on their shared attributes with the base known objects.
arXiv Detail & Related papers (2023-12-10T03:56:06Z) - Exploring the Effectiveness of Dataset Synthesis: An application of
Apple Detection in Orchards [68.95806641664713]
We explore the usability of Stable Diffusion 2.1-base for generating synthetic datasets of apple trees for object detection.
We train a YOLOv5m object detection model to predict apples in a real-world apple detection dataset.
Results demonstrate that the model trained on generated data is slightly underperforming compared to a baseline model trained on real-world images.
arXiv Detail & Related papers (2023-06-20T09:46:01Z) - A Gait Triaging Toolkit for Overlapping Acoustic Events in Indoor
Environments [0.1933681537640272]
We propose a novel machine learning based filter which can triage gait audio samples suitable for training machine learning models for gait detection.
To demonstrate the effectiveness of the filter, we train and evaluate a deep learning model on gait datasets collected from older adults with and without applying the filter.
The proposed filter will help automate the task of manual annotation of gait samples for training acoustic based gait detection models for older adults in indoor environments.
arXiv Detail & Related papers (2022-11-11T01:33:14Z) - Learning to diagnose common thorax diseases on chest radiographs from
radiology reports in Vietnamese [0.33598755777055367]
We propose a data collecting and annotation pipeline that extracts information from Vietnamese radiology reports to provide accurate labels for chest X-ray (CXR) images.
This can benefit Vietnamese radiologists and clinicians by annotating data that closely match their endemic diagnosis categories which may vary from country to country.
arXiv Detail & Related papers (2022-09-11T06:06:03Z) - Fake It Till You Make It: Near-Distribution Novelty Detection by
Score-Based Generative Models [54.182955830194445]
existing models either fail or face a dramatic drop under the so-called near-distribution" setting.
We propose to exploit a score-based generative model to produce synthetic near-distribution anomalous data.
Our method improves the near-distribution novelty detection by 6% and passes the state-of-the-art by 1% to 5% across nine novelty detection benchmarks.
arXiv Detail & Related papers (2022-05-28T02:02:53Z) - Learning with Noisy Labels Revisited: A Study Using Real-World Human
Annotations [54.400167806154535]
Existing research on learning with noisy labels mainly focuses on synthetic label noise.
This work presents two new benchmark datasets (CIFAR-10N, CIFAR-100N)
We show that real-world noisy labels follow an instance-dependent pattern rather than the classically adopted class-dependent ones.
arXiv Detail & Related papers (2021-10-22T22:42:11Z) - Chest x-ray automated triage: a semiologic approach designed for
clinical implementation, exploiting different types of labels through a
combination of four Deep Learning architectures [83.48996461770017]
This work presents a Deep Learning method based on the late fusion of different convolutional architectures.
We built four training datasets combining images from public chest x-ray datasets and our institutional archive.
We trained four different Deep Learning architectures and combined their outputs with a late fusion strategy, obtaining a unified tool.
arXiv Detail & Related papers (2020-12-23T14:38:35Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.