Efficient speech detection in environmental audio using acoustic
recognition and knowledge distillation
- URL: http://arxiv.org/abs/2312.09269v1
- Date: Thu, 14 Dec 2023 17:55:32 GMT
- Title: Efficient speech detection in environmental audio using acoustic
recognition and knowledge distillation
- Authors: Drew Priebe, Burooj Ghani, Dan Stowell
- Abstract summary: Acoustic monitoring of biodiversity has emerged as an important monitoring tool.
Despite significant strides in deep learning, the deployment of large neural networks on compact devices poses challenges due to memory and latency constraints.
Our approach focuses on leveraging knowledge distillation techniques to design efficient, lightweight student models for speech detection in bioacoustics.
- Score: 3.732312301223128
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The ongoing biodiversity crisis, driven by factors such as land-use change
and global warming, emphasizes the need for effective ecological monitoring
methods. Acoustic monitoring of biodiversity has emerged as an important
monitoring tool. Detecting human voices in soundscape monitoring projects is
useful both for analysing human disturbance and for privacy filtering. Despite
significant strides in deep learning in recent years, the deployment of large
neural networks on compact devices poses challenges due to memory and latency
constraints. Our approach focuses on leveraging knowledge distillation
techniques to design efficient, lightweight student models for speech detection
in bioacoustics. In particular, we employed the MobileNetV3-Small-Pi model to
create compact yet effective student architectures to compare against the
larger EcoVAD teacher model, a well-regarded voice detection architecture in
eco-acoustic monitoring. The comparative analysis included examining various
configurations of the MobileNetV3-Small-Pi derived student models to identify
optimal performance. Additionally, a thorough evaluation of different
distillation techniques was conducted to ascertain the most effective method
for model selection. Our findings revealed that the distilled models exhibited
comparable performance to the EcoVAD teacher model, indicating a promising
approach to overcoming computational barriers for real-time ecological
monitoring.
Related papers
- Enhancing Ecological Monitoring with Multi-Objective Optimization: A Novel Dataset and Methodology for Segmentation Algorithms [17.802456388479616]
We introduce a unique semantic segmentation dataset of 6,096 high-resolution aerial images capturing indigenous and invasive grass species in Bega Valley, New South Wales, Australia.
This dataset presents a challenging task due to the overlap and distribution of grass species.
The dataset and code will be made publicly available, aiming to drive research in computer vision, machine learning, and ecological studies.
arXiv Detail & Related papers (2024-07-25T18:27:27Z) - ActiveRIR: Active Audio-Visual Exploration for Acoustic Environment Modeling [57.1025908604556]
An environment acoustic model represents how sound is transformed by the physical characteristics of an indoor environment.
We propose active acoustic sampling, a new task for efficiently building an environment acoustic model of an unmapped environment.
We introduce ActiveRIR, a reinforcement learning policy that leverages information from audio-visual sensor streams to guide agent navigation and determine optimal acoustic data sampling positions.
arXiv Detail & Related papers (2024-04-24T21:30:01Z) - A Comparative Study of Machine Learning Algorithms for Anomaly Detection
in Industrial Environments: Performance and Environmental Impact [62.997667081978825]
This study seeks to address the demands of high-performance machine learning models with environmental sustainability.
Traditional machine learning algorithms, such as Decision Trees and Random Forests, demonstrate robust efficiency and performance.
However, superior outcomes were obtained with optimised configurations, albeit with a commensurate increase in resource consumption.
arXiv Detail & Related papers (2023-07-01T15:18:00Z) - Analysing the Impact of Audio Quality on the Use of Naturalistic
Long-Form Recordings for Infant-Directed Speech Research [62.997667081978825]
Modelling of early language acquisition aims to understand how infants bootstrap their language skills.
Recent developments have enabled the use of more naturalistic training data for computational models.
It is currently unclear how the sound quality could affect analyses and modelling experiments conducted on such data.
arXiv Detail & Related papers (2023-05-03T08:25:37Z) - Automated Detection of Dolphin Whistles with Convolutional Networks and
Transfer Learning [7.52108936537426]
We show that convolutional neural networks can significantly outperform traditional automatic methods in a challenging detection task.
The proposed system can detect signals even in the presence of ambient noise, at the same time consistently reducing the likelihood of producing false positives and false negatives.
arXiv Detail & Related papers (2022-11-28T15:06:46Z) - End-to-End Binaural Speech Synthesis [71.1869877389535]
We present an end-to-end speech synthesis system that combines a low-bitrate audio system with a powerful decoder.
We demonstrate the capability of the adversarial loss in capturing environment effects needed to create an authentic auditory scene.
arXiv Detail & Related papers (2022-07-08T05:18:36Z) - Parsing Birdsong with Deep Audio Embeddings [0.5599792629509227]
We present a semi-supervised approach to identify characteristic calls and environmental noise.
We utilize several methods to learn a latent representation of audio samples, including a convolutional autoencoder and two pre-trained networks.
arXiv Detail & Related papers (2021-08-20T14:45:44Z) - Discriminative Singular Spectrum Classifier with Applications on
Bioacoustic Signal Recognition [67.4171845020675]
We present a bioacoustic signal classifier equipped with a discriminative mechanism to extract useful features for analysis and classification efficiently.
Unlike current bioacoustic recognition methods, which are task-oriented, the proposed model relies on transforming the input signals into vector subspaces.
The validity of the proposed method is verified using three challenging bioacoustic datasets containing anuran, bee, and mosquito species.
arXiv Detail & Related papers (2021-03-18T11:01:21Z) - Modelling Animal Biodiversity Using Acoustic Monitoring and Deep
Learning [0.0]
This paper outlines an approach for achieving this using state of the art in machine learning to automatically extract features from time-series audio signals.
The acquired bird songs are processed using mel-frequency cepstrum (MFC) to extract features which are later classified using a multilayer perceptron (MLP)
Our proposed method achieved promising results with 0.74 sensitivity, 0.92 specificity and an accuracy of 0.74.
arXiv Detail & Related papers (2021-03-12T13:50:31Z) - From Sound Representation to Model Robustness [82.21746840893658]
We investigate the impact of different standard environmental sound representations (spectrograms) on the recognition performance and adversarial attack robustness of a victim residual convolutional neural network.
Averaged over various experiments on three environmental sound datasets, we found the ResNet-18 model outperforms other deep learning architectures.
arXiv Detail & Related papers (2020-07-27T17:30:49Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.