InfantCryNet: A Data-driven Framework for Intelligent Analysis of Infant Cries
- URL: http://arxiv.org/abs/2409.19689v2
- Date: Tue, 04 Feb 2025 10:23:05 GMT
- Title: InfantCryNet: A Data-driven Framework for Intelligent Analysis of Infant Cries
- Authors: Mengze Hong, Chen Jason Zhang, Lingxiao Yang, Yuanfeng Song, Di Jiang,
- Abstract summary: We present a novel data-driven framework, "InfantCryNet," for accomplishing these tasks.<n>We employ pre-trained audio models to incorporate prior knowledge into our model.<n>Experiments on real-life datasets demonstrate the superior performance of the proposed framework.
- Score: 24.06154195051215
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Understanding the meaning of infant cries is a significant challenge for young parents in caring for their newborns. The presence of background noise and the lack of labeled data present practical challenges in developing systems that can detect crying and analyze its underlying reasons. In this paper, we present a novel data-driven framework, "InfantCryNet," for accomplishing these tasks. To address the issue of data scarcity, we employ pre-trained audio models to incorporate prior knowledge into our model. We propose the use of statistical pooling and multi-head attention pooling techniques to extract features more effectively. Additionally, knowledge distillation and model quantization are applied to enhance model efficiency and reduce the model size, better supporting industrial deployment in mobile devices. Experiments on real-life datasets demonstrate the superior performance of the proposed framework, outperforming state-of-the-art baselines by 4.4% in classification accuracy. The model compression effectively reduces the model size by 7% without compromising performance and by up to 28% with only an 8% decrease in accuracy, offering practical insights for model selection and system design.
Related papers
- Speech transformer models for extracting information from baby cries [0.6822819361110412]
We evaluate five pre-trained speech models on eight baby cries datasets.<n>For each dataset, we assess the latent representations of each model across all available classification tasks.<n>Our results demonstrate that the latent representations of these models can effectively classify human baby cries.
arXiv Detail & Related papers (2025-09-02T12:34:33Z) - Vision-Based Embedded System for Noncontact Monitoring of Preterm Infant Behavior in Low-Resource Care Settings [0.0]
Preterm birth is a leading cause of neonatal mortality, disproportionately affecting low-resource settings with limited access to advanced neonatal intensive care units (NICUs)<n>This paper presents a novel, noninvasive, and automated vision-based framework to address this gap.<n>We introduce an embedded monitoring system that utilizes a quantized MobileNet model deployed on a Raspberry Pi for real-time behavioral state detection.
arXiv Detail & Related papers (2025-09-02T07:05:47Z) - Accelerated Test-Time Scaling with Model-Free Speculative Sampling [58.69141724095398]
We introduce STAND (STochastic Adaptive N-gram Drafting), a novel model-free speculative decoding approach.<n>We show that STAND reduces inference latency by 60-65% compared to standard autoregressive decoding.<n>As a model-free approach, STAND can be applied to any existing language model without additional training.
arXiv Detail & Related papers (2025-06-05T07:31:18Z) - Model-agnostic Mitigation Strategies of Data Imbalance for Regression [0.0]
Data imbalance persists as a pervasive challenge in regression tasks, introducing bias in model performance and undermining predictive reliability.<n>We present advanced mitigation techniques, which build upon and improve existing sampling methods.<n>We demonstrate that constructing an ensemble of models -- one trained with imbalance mitigation and another without -- can significantly reduce these negative effects.
arXiv Detail & Related papers (2025-06-02T09:46:08Z) - Benchmarking Reasoning Robustness in Large Language Models [76.79744000300363]
We find significant performance degradation on novel or incomplete data.
These findings highlight the reliance on recall over rigorous logical inference.
This paper introduces a novel benchmark, termed as Math-RoB, that exploits hallucinations triggered by missing information to expose reasoning gaps.
arXiv Detail & Related papers (2025-03-06T15:36:06Z) - Improving Anomalous Sound Detection via Low-Rank Adaptation Fine-Tuning of Pre-Trained Audio Models [45.90037602677841]
This paper introduces a robust Anomalous Sound Detection (ASD) model that leverages audio pre-trained models.
We fine-tune these models using machine operation data, employing SpecAug as a data augmentation strategy.
Our experiments establish a new benchmark of 77.75% on the evaluation set, with a significant improvement of 6.48% compared with previous state-of-the-art (SOTA) models.
arXiv Detail & Related papers (2024-09-11T05:19:38Z) - Tiny-Toxic-Detector: A compact transformer-based model for toxic content detection [0.0]
This paper presents Tiny-toxic-detector, a compact transformer-based model designed for toxic content detection.
Despite having only 2.1 million parameters, Tiny-toxic-detector achieves competitive performance on benchmark datasets.
Tiny-toxic-detector represents progress toward more sustainable and scalable AI-driven content moderation solutions.
arXiv Detail & Related papers (2024-08-29T22:31:38Z) - Low-rank finetuning for LLMs: A fairness perspective [54.13240282850982]
Low-rank approximation techniques have become the de facto standard for fine-tuning Large Language Models.
This paper investigates the effectiveness of these methods in capturing the shift of fine-tuning datasets from the initial pre-trained data distribution.
We show that low-rank fine-tuning inadvertently preserves undesirable biases and toxic behaviors.
arXiv Detail & Related papers (2024-05-28T20:43:53Z) - DetDiffusion: Synergizing Generative and Perceptive Models for Enhanced Data Generation and Perception [78.26734070960886]
Current perceptive models heavily depend on resource-intensive datasets.
We introduce perception-aware loss (P.A. loss) through segmentation, improving both quality and controllability.
Our method customizes data augmentation by extracting and utilizing perception-aware attribute (P.A. Attr) during generation.
arXiv Detail & Related papers (2024-03-20T04:58:03Z) - Zero-shot Retrieval: Augmenting Pre-trained Models with Search Engines [83.65380507372483]
Large pre-trained models can dramatically reduce the amount of task-specific data required to solve a problem, but they often fail to capture domain-specific nuances out of the box.
This paper shows how to leverage recent advances in NLP and multi-modal learning to augment a pre-trained model with search engine retrieval.
arXiv Detail & Related papers (2023-11-29T05:33:28Z) - QualEval: Qualitative Evaluation for Model Improvement [82.73561470966658]
We propose QualEval, which augments quantitative scalar metrics with automated qualitative evaluation as a vehicle for model improvement.
QualEval uses a powerful LLM reasoner and our novel flexible linear programming solver to generate human-readable insights.
We demonstrate that leveraging its insights, for example, improves the absolute performance of the Llama 2 model by up to 15% points relative.
arXiv Detail & Related papers (2023-11-06T00:21:44Z) - Progressive reduced order modeling: empowering data-driven modeling with
selective knowledge transfer [0.0]
We propose a progressive reduced order modeling framework that minimizes data cravings and enhances data-driven modeling's practicality.
Our approach selectively transfers knowledge from previously trained models through gates, similar to how humans selectively use valuable knowledge while ignoring unuseful information.
We have tested our framework in several cases, including transport in porous media, gravity-driven flow, and finite deformation in hyperelastic materials.
arXiv Detail & Related papers (2023-10-04T23:50:14Z) - Uncovering the Hidden Cost of Model Compression [43.62624133952414]
Visual Prompting has emerged as a pivotal method for transfer learning in computer vision.
Model compression detrimentally impacts the performance of visual prompting-based transfer.
However, negative effects on calibration are not present when models are compressed via quantization.
arXiv Detail & Related papers (2023-08-29T01:47:49Z) - Knowledge distillation: A good teacher is patient and consistent [71.14922743774864]
There is a growing discrepancy in computer vision between large-scale models that achieve state-of-the-art performance and models that are affordable in practical applications.
We identify certain implicit design choices, which may drastically affect the effectiveness of distillation.
We obtain a state-of-the-art ResNet-50 model for ImageNet, which achieves 82.8% top-1 accuracy.
arXiv Detail & Related papers (2021-06-09T17:20:40Z) - Towards Practical Lipreading with Distilled and Efficient Models [57.41253104365274]
Lipreading has witnessed a lot of progress due to the resurgence of neural networks.
Recent works have placed emphasis on aspects such as improving performance by finding the optimal architecture or improving generalization.
There is still a significant gap between the current methodologies and the requirements for an effective deployment of lipreading in practical scenarios.
We propose a series of innovations that significantly bridge that gap: first, we raise the state-of-the-art performance by a wide margin on LRW and LRW-1000 to 88.5% and 46.6%, respectively using self-distillation.
arXiv Detail & Related papers (2020-07-13T16:56:27Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.