Utilizing synthetic training data for the supervised classification of
rat ultrasonic vocalizations
- URL: http://arxiv.org/abs/2303.03183v2
- Date: Fri, 19 Jan 2024 02:31:58 GMT
- Title: Utilizing synthetic training data for the supervised classification of
rat ultrasonic vocalizations
- Authors: K. Jack Scott, Lucinda J. Speers, David K. Bilkey
- Abstract summary: Murine rodents generate ultrasonic vocalizations (USVs) with frequencies that extend to around 120kHz.
These calls are important in social behaviour, and so their analysis can provide insights into the function of vocal communication, and its dysfunction.
We compare the detection and classification performance of a trained human against two convolutional neural networks (CNNs), DeepSqueak and VocalMat, on audio containing rat USVs.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Murine rodents generate ultrasonic vocalizations (USVs) with frequencies that
extend to around 120kHz. These calls are important in social behaviour, and so
their analysis can provide insights into the function of vocal communication,
and its dysfunction. The manual identification of USVs, and subsequent
classification into different subcategories is time consuming. Although machine
learning approaches for identification and classification can lead to enormous
efficiency gains, the time and effort required to generate training data can be
high, and the accuracy of current approaches can be problematic. Here we
compare the detection and classification performance of a trained human against
two convolutional neural networks (CNNs), DeepSqueak and VocalMat, on audio
containing rat USVs. Furthermore, we test the effect of inserting synthetic
USVs into the training data of the VocalMat CNN as a means of reducing the
workload associated with generating a training set. Our results indicate that
VocalMat outperformed the DeepSqueak CNN on measures of call identification,
and classification. Additionally, we found that the augmentation of training
data with synthetic images resulted in a further improvement in accuracy, such
that it was sufficiently close to human performance to allow for the use of
this software in laboratory conditions.
Related papers
- DeepSpeech models show Human-like Performance and Processing of Cochlear Implant Inputs [12.234206036041218]
We use the deep neural network (DNN) DeepSpeech2 as a paradigm to investigate how natural input and cochlear implant-based inputs are processed over time.
We generate naturalistic and cochlear implant-like inputs from spoken sentences and test the similarity of model performance to human performance.
We find that dynamics over time in each layer are affected by context as well as input type.
arXiv Detail & Related papers (2024-07-30T04:32:27Z) - Investigating the Robustness of Vision Transformers against Label Noise
in Medical Image Classification [8.578500152567164]
Label noise in medical image classification datasets hampers the training of supervised deep learning methods.
We show that pretraining is crucial for ensuring ViT's improved robustness against label noise in supervised training.
arXiv Detail & Related papers (2024-02-26T16:53:23Z) - Self-Supervised Pretraining Improves Performance and Inference
Efficiency in Multiple Lung Ultrasound Interpretation Tasks [65.23740556896654]
We investigated whether self-supervised pretraining could produce a neural network feature extractor applicable to multiple classification tasks in lung ultrasound analysis.
When fine-tuning on three lung ultrasound tasks, pretrained models resulted in an improvement of the average across-task area under the receiver operating curve (AUC) by 0.032 and 0.061 on local and external test sets respectively.
arXiv Detail & Related papers (2023-09-05T21:36:42Z) - Data-Efficient Vision Transformers for Multi-Label Disease
Classification on Chest Radiographs [55.78588835407174]
Vision Transformers (ViTs) have not been applied to this task despite their high classification performance on generic images.
ViTs do not rely on convolutions but on patch-based self-attention and in contrast to CNNs, no prior knowledge of local connectivity is present.
Our results show that while the performance between ViTs and CNNs is on par with a small benefit for ViTs, DeiTs outperform the former if a reasonably large data set is available for training.
arXiv Detail & Related papers (2022-08-17T09:07:45Z) - Deep Feature Learning for Medical Acoustics [78.56998585396421]
The purpose of this paper is to compare different learnables in medical acoustics tasks.
A framework has been implemented to classify human respiratory sounds and heartbeats in two categories, i.e. healthy or affected by pathologies.
arXiv Detail & Related papers (2022-08-05T10:39:37Z) - Improved Speech Emotion Recognition using Transfer Learning and
Spectrogram Augmentation [56.264157127549446]
Speech emotion recognition (SER) is a challenging task that plays a crucial role in natural human-computer interaction.
One of the main challenges in SER is data scarcity.
We propose a transfer learning strategy combined with spectrogram augmentation.
arXiv Detail & Related papers (2021-08-05T10:39:39Z) - Deep Learning-based Cattle Activity Classification Using Joint
Time-frequency Data Representation [2.472770436480857]
In this paper, a sequential deep neural network is used to develop a behavioural model and to classify cattle behaviour and activities.
The key focus of this paper is the exploration of a joint time-frequency domain representation of the sensor data.
Our exploration is based on a real-world data set with over 3 million samples, collected from sensors with a tri-axial accelerometer, magnetometer and gyroscope.
arXiv Detail & Related papers (2020-11-06T14:24:55Z) - Surgical Mask Detection with Convolutional Neural Networks and Data
Augmentations on Spectrograms [8.747840760772268]
We show the impact of data augmentation on the binary classification task of surgical mask detection in samples of human voice.
Results show that most of the baselines given by ComParE are outperformed.
arXiv Detail & Related papers (2020-08-11T09:02:47Z) - Rectified Meta-Learning from Noisy Labels for Robust Image-based Plant
Disease Diagnosis [64.82680813427054]
Plant diseases serve as one of main threats to food security and crop production.
One popular approach is to transform this problem as a leaf image classification task, which can be addressed by the powerful convolutional neural networks (CNNs)
We propose a novel framework that incorporates rectified meta-learning module into common CNN paradigm to train a noise-robust deep network without using extra supervision information.
arXiv Detail & Related papers (2020-03-17T09:51:30Z) - Curriculum By Smoothing [52.08553521577014]
Convolutional Neural Networks (CNNs) have shown impressive performance in computer vision tasks such as image classification, detection, and segmentation.
We propose an elegant curriculum based scheme that smoothes the feature embedding of a CNN using anti-aliasing or low-pass filters.
As the amount of information in the feature maps increases during training, the network is able to progressively learn better representations of the data.
arXiv Detail & Related papers (2020-03-03T07:27:44Z) - CURE Dataset: Ladder Networks for Audio Event Classification [15.850545634216484]
There are approximately 3M people with hearing loss who can't perceive events happening around them.
This paper establishes the CURE dataset which contains curated set of specific audio events most relevant for people with hearing loss.
arXiv Detail & Related papers (2020-01-12T09:35:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.