Detection of Children Abuse by Voice and Audio Classification by
Short-Time Fourier Transform Machine Learning implemented on Nvidia Edge GPU
device
- URL: http://arxiv.org/abs/2307.15101v1
- Date: Thu, 27 Jul 2023 16:48:19 GMT
- Title: Detection of Children Abuse by Voice and Audio Classification by
Short-Time Fourier Transform Machine Learning implemented on Nvidia Edge GPU
device
- Authors: Jiuqi Yan, Yingxian Chen, W.W.T.Fok
- Abstract summary: This experiment uses machine learning to classify and recognize a child's voice.
If a child is found to be crying or screaming, an alert is immediately sent to the relevant personnel.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The safety of children in children home has become an increasing social
concern, and the purpose of this experiment is to use machine learning applied
to detect the scenarios of child abuse to increase the safety of children. This
experiment uses machine learning to classify and recognize a child's voice and
predict whether the current sound made by the child is crying, screaming or
laughing. If a child is found to be crying or screaming, an alert is
immediately sent to the relevant personnel so that they can perceive what the
child may be experiencing in a surveillance blind spot and respond in a timely
manner. Together with a hybrid use of video image classification, the accuracy
of child abuse detection can be significantly increased. This greatly reduces
the likelihood that a child will receive violent abuse in the nursery and
allows personnel to stop an imminent or incipient child abuse incident in time.
The datasets collected from this experiment is entirely from sounds recorded on
site at the children home, including crying, laughing, screaming sound and
background noises. These sound files are transformed into spectrograms using
Short-Time Fourier Transform, and then these image data are imported into a CNN
neural network for classification, and the final trained model can achieve an
accuracy of about 92% for sound detection.
Related papers
- Self-supervised learning for infant cry analysis [2.7973623341455602]
We explore self-supervised learning (SSL) for analyzing a first-of-its-kind database of cry recordings containing clinical indications of more than a thousand newborns.
Specifically, we target cry-based detection of neurological injury as well as identification of cry triggers such as pain, hunger, and discomfort.
We show that pre-training with SSL contrastive loss (SimCLR) performs significantly better than supervised pre-training for both neuro injury and cry triggers.
arXiv Detail & Related papers (2023-05-02T16:27:18Z) - Weakly Supervised Detection of Baby Cry [14.778851751964936]
We propose to use weakly supervised anomaly detection to detect a baby cry.
In this weak supervision, we only need weak annotation if there is a cry in an audio file.
arXiv Detail & Related papers (2023-04-19T22:38:45Z) - SceneFake: An Initial Dataset and Benchmarks for Scene Fake Audio Detection [54.74467470358476]
This paper proposes a dataset for scene fake audio detection named SceneFake.
A manipulated audio is generated by only tampering with the acoustic scene of an original audio.
Some scene fake audio detection benchmark results on the SceneFake dataset are reported in this paper.
arXiv Detail & Related papers (2022-11-11T09:05:50Z) - Low-dimensional representation of infant and adult vocalization
acoustics [2.1826796927092214]
We use spectral features extraction and unsupervised machine learning, specifically Uniform Manifold Approximation (UMAP), to obtain a novel 2-dimensional spatial representation of infant and caregiver vocalizations extracted from day-long home recordings.
For instance, we found that the dispersion of infant vocalization acoustics within the 2-D space over a day increased from 3 to 9 months, and then decreased from 9 to 18 months.
arXiv Detail & Related papers (2022-04-25T17:58:13Z) - Audio-Visual Person-of-Interest DeepFake Detection [77.04789677645682]
The aim of this work is to propose a deepfake detector that can cope with the wide variety of manipulation methods and scenarios encountered in the real world.
We leverage a contrastive learning paradigm to learn the moving-face and audio segment embeddings that are most discriminative for each identity.
Our method can detect both single-modality (audio-only, video-only) and multi-modality (audio-video) attacks, and is robust to low-quality or corrupted videos.
arXiv Detail & Related papers (2022-04-06T20:51:40Z) - Classifying Autism from Crowdsourced Semi-Structured Speech Recordings:
A Machine Learning Approach [0.9945783208680666]
We present a suite of machine learning approaches to detect autism in self-recorded speech audio captured from autistic and neurotypical (NT) children in home environments.
We consider three methods to detect autism in child speech: first, Random Forests trained on extracted audio features; second, convolutional neural networks (CNNs) trained on spectrograms; and third, fine-tuned wav2vec 2.0--a state-of-the-art Transformer-based ASR model.
arXiv Detail & Related papers (2022-01-04T01:31:02Z) - Audio-visual Representation Learning for Anomaly Events Detection in
Crowds [119.72951028190586]
This paper attempts to exploit multi-modal learning for modeling the audio and visual signals simultaneously.
We conduct the experiments on SHADE dataset, a synthetic audio-visual dataset in surveillance scenes.
We find introducing audio signals effectively improves the performance of anomaly events detection and outperforms other state-of-the-art methods.
arXiv Detail & Related papers (2021-10-28T02:42:48Z) - Robust Feature Learning on Long-Duration Sounds for Acoustic Scene
Classification [54.57150493905063]
Acoustic scene classification (ASC) aims to identify the type of scene (environment) in which a given audio signal is recorded.
We propose a robust feature learning (RFL) framework to train the CNN.
arXiv Detail & Related papers (2021-08-11T03:33:05Z) - Speech Enhancement for Wake-Up-Word detection in Voice Assistants [60.103753056973815]
Keywords spotting and in particular Wake-Up-Word (WUW) detection is a very important task for voice assistants.
This paper proposes a Speech Enhancement model adapted to the task of WUW detection.
It aims at increasing the recognition rate and reducing the false alarms in the presence of these types of noises.
arXiv Detail & Related papers (2021-01-29T18:44:05Z) - Generating Visually Aligned Sound from Videos [83.89485254543888]
We focus on the task of generating sound from natural videos.
The sound should be both temporally and content-wise aligned with visual signals.
Some sounds generated outside of a camera can not be inferred from video content.
arXiv Detail & Related papers (2020-07-14T07:51:06Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.