Infant Cry Detection Using Causal Temporal Representation
- URL: http://arxiv.org/abs/2503.06247v1
- Date: Sat, 08 Mar 2025 15:15:23 GMT
- Title: Infant Cry Detection Using Causal Temporal Representation
- Authors: Minghao Fu, Danning Li, Aryan Gadhiya, Benjamin Lambright, Mohamed Alowais, Mohab Bahnassy, Saad El Dine Elletter, Hawau Olamide Toyin, Haiyan Jiang, Kun Zhang, Hanan Aldarmaki,
- Abstract summary: We present two contributions for supervised and unsupervised infant cry detection.<n>The first is an annotated dataset for cry segmentation, which enables supervised models to achieve state-of-the-art performance.
- Score: 6.240468701036028
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This paper addresses a major challenge in acoustic event detection, in particular infant cry detection in the presence of other sounds and background noises: the lack of precise annotated data. We present two contributions for supervised and unsupervised infant cry detection. The first is an annotated dataset for cry segmentation, which enables supervised models to achieve state-of-the-art performance. Additionally, we propose a novel unsupervised method, Causal Representation Spare Transition Clustering (CRSTC), based on causal temporal representation, which helps address the issue of data scarcity more generally. By integrating the detected cry segments, we significantly improve the performance of downstream infant cry classification, highlighting the potential of this approach for infant care applications.
Related papers
- Robust Tiny Object Detection in Aerial Images amidst Label Noise [50.257696872021164]
This study addresses the issue of tiny object detection under noisy label supervision.
We propose a DeNoising Tiny Object Detector (DN-TOD), which incorporates a Class-aware Label Correction scheme.
Our method can be seamlessly integrated into both one-stage and two-stage object detection pipelines.
arXiv Detail & Related papers (2024-01-16T02:14:33Z) - Detection of Children Abuse by Voice and Audio Classification by
Short-Time Fourier Transform Machine Learning implemented on Nvidia Edge GPU
device [0.0]
This experiment uses machine learning to classify and recognize a child's voice.
If a child is found to be crying or screaming, an alert is immediately sent to the relevant personnel.
arXiv Detail & Related papers (2023-07-27T16:48:19Z) - Self-supervised learning for infant cry analysis [2.7973623341455602]
We explore self-supervised learning (SSL) for analyzing a first-of-its-kind database of cry recordings containing clinical indications of more than a thousand newborns.
Specifically, we target cry-based detection of neurological injury as well as identification of cry triggers such as pain, hunger, and discomfort.
We show that pre-training with SSL contrastive loss (SimCLR) performs significantly better than supervised pre-training for both neuro injury and cry triggers.
arXiv Detail & Related papers (2023-05-02T16:27:18Z) - Weakly Supervised Detection of Baby Cry [14.778851751964936]
We propose to use weakly supervised anomaly detection to detect a baby cry.
In this weak supervision, we only need weak annotation if there is a cry in an audio file.
arXiv Detail & Related papers (2023-04-19T22:38:45Z) - Unsupervised Video Anomaly Detection for Stereotypical Behaviours in
Autism [20.09315869162054]
This paper focuses on automatically detecting stereotypical behaviours with computer vision techniques.
We propose a Dual Stream deep model for Stereotypical Behaviours Detection, DS-SBD, based on the temporal trajectory of human poses and the repetition patterns of human actions.
arXiv Detail & Related papers (2023-02-27T13:24:08Z) - ReAct: Temporal Action Detection with Relational Queries [84.76646044604055]
This work aims at advancing temporal action detection (TAD) using an encoder-decoder framework with action queries.
We first propose a relational attention mechanism in the decoder, which guides the attention among queries based on their relations.
Lastly, we propose to predict the localization quality of each action query at inference in order to distinguish high-quality queries.
arXiv Detail & Related papers (2022-07-14T17:46:37Z) - Automated Classification of General Movements in Infants Using a
Two-stream Spatiotemporal Fusion Network [5.541644538483947]
The assessment of general movements (GMs) in infants is a useful tool in the early diagnosis of neurodevelopmental disorders.
Recent video-based GMs classification has attracted attention, but this approach would be strongly affected by irrelevant information.
We propose an automated GMs classification method, which consists of preprocessing networks that remove unnecessary background information.
arXiv Detail & Related papers (2022-07-04T05:21:09Z) - SegTAD: Precise Temporal Action Detection via Semantic Segmentation [65.01826091117746]
We formulate the task of temporal action detection in a novel perspective of semantic segmentation.
Owing to the 1-dimensional property of TAD, we are able to convert the coarse-grained detection annotations to fine-grained semantic segmentation annotations for free.
We propose an end-to-end framework SegTAD composed of a 1D semantic segmentation network (1D-SSN) and a proposal detection network (PDN)
arXiv Detail & Related papers (2022-03-03T06:52:13Z) - Audio-visual Representation Learning for Anomaly Events Detection in
Crowds [119.72951028190586]
This paper attempts to exploit multi-modal learning for modeling the audio and visual signals simultaneously.
We conduct the experiments on SHADE dataset, a synthetic audio-visual dataset in surveillance scenes.
We find introducing audio signals effectively improves the performance of anomaly events detection and outperforms other state-of-the-art methods.
arXiv Detail & Related papers (2021-10-28T02:42:48Z) - Reference-based Defect Detection Network [57.89399576743665]
The first issue is the texture shift which means a trained defect detector model will be easily affected by unseen texture.
The second issue is partial visual confusion which indicates that a partial defect box is visually similar with a complete box.
We propose a Reference-based Defect Detection Network (RDDN) to tackle these two problems.
arXiv Detail & Related papers (2021-08-10T05:44:23Z) - WSSOD: A New Pipeline for Weakly- and Semi-Supervised Object Detection [75.80075054706079]
We propose a weakly- and semi-supervised object detection framework (WSSOD)
An agent detector is first trained on a joint dataset and then used to predict pseudo bounding boxes on weakly-annotated images.
The proposed framework demonstrates remarkable performance on PASCAL-VOC and MSCOCO benchmark, achieving a high performance comparable to those obtained in fully-supervised settings.
arXiv Detail & Related papers (2021-05-21T11:58:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.