HumBugDB: A Large-scale Acoustic Mosquito Dataset
- URL: http://arxiv.org/abs/2110.07607v1
- Date: Thu, 14 Oct 2021 14:18:17 GMT
- Title: HumBugDB: A Large-scale Acoustic Mosquito Dataset
- Authors: Ivan Kiskin, Marianne Sinka, Adam D. Cobb, Waqas Rafique, Lawrence
Wang, Davide Zilli, Benjamin Gutteridge, Rinita Dam, Theodoros Marinos,
Yunpeng Li, Dickson Msaky, Emmanuel Kaindoa, Gerard Killeen, Eva
Herreros-Moya, Kathy J. Willis, Stephen J. Roberts
- Abstract summary: This paper presents the first large-scale multi-species dataset of acoustic recordings of mosquitoes tracked continuously in free flight.
We present 20 hours of audio recordings that we have expertly labelled and tagged precisely in time.
18 hours of recordings contain annotations from 36 different species.
- Score: 15.108701811353097
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: This paper presents the first large-scale multi-species dataset of acoustic
recordings of mosquitoes tracked continuously in free flight. We present 20
hours of audio recordings that we have expertly labelled and tagged precisely
in time. Significantly, 18 hours of recordings contain annotations from 36
different species. Mosquitoes are well-known carriers of diseases such as
malaria, dengue and yellow fever. Collecting this dataset is motivated by the
need to assist applications which utilise mosquito acoustics to conduct surveys
to help predict outbreaks and inform intervention policy. The task of detecting
mosquitoes from the sound of their wingbeats is challenging due to the
difficulty in collecting recordings from realistic scenarios. To address this,
as part of the HumBug project, we conducted global experiments to record
mosquitoes ranging from those bred in culture cages to mosquitoes captured in
the wild. Consequently, the audio recordings vary in signal-to-noise ratio and
contain a broad range of indoor and outdoor background environments from
Tanzania, Thailand, Kenya, the USA and the UK. In this paper we describe in
detail how we collected, labelled and curated the data. The data is provided
from a PostgreSQL database, which contains important metadata such as the
capture method, age, feeding status and gender of the mosquitoes. Additionally,
we provide code to extract features and train Bayesian convolutional neural
networks for two key tasks: the identification of mosquitoes from their
corresponding background environments, and the classification of detected
mosquitoes into species. Our extensive dataset is both challenging to machine
learning researchers focusing on acoustic identification, and critical to
entomologists, geo-spatial modellers and other domain experts to understand
mosquito behaviour, model their distribution, and manage the threat they pose
to humans.
Related papers
- SatBird: Bird Species Distribution Modeling with Remote Sensing and
Citizen Science Data [68.2366021016172]
We present SatBird, a satellite dataset of locations in the USA with labels derived from presence-absence observation data from the citizen science database eBird.
We also provide a dataset in Kenya representing low-data regimes.
We benchmark a set of baselines on our dataset, including SOTA models for remote sensing tasks.
arXiv Detail & Related papers (2023-11-02T02:00:27Z) - AnuraSet: A dataset for benchmarking Neotropical anuran calls
identification in passive acoustic monitoring [0.0]
This paper introduces a large-scale dataset of anuran calls recorded by passive acoustic monitoring (PAM)
We provide open access to the dataset, including the raw recordings, experimental setup code, and a benchmark with a baseline model of the fine-grained categorization problem.
We highlight the challenges of the dataset to encourage machine learning researchers to solve the problem of anuran call identification towards conservation policy.
arXiv Detail & Related papers (2023-07-11T22:25:21Z) - Spatial Implicit Neural Representations for Global-Scale Species Mapping [72.92028508757281]
Given a set of locations where a species has been observed, the goal is to build a model to predict whether the species is present or absent at any location.
Traditional methods struggle to take advantage of emerging large-scale crowdsourced datasets.
We use Spatial Implicit Neural Representations (SINRs) to jointly estimate the geographical range of 47k species simultaneously.
arXiv Detail & Related papers (2023-06-05T03:36:01Z) - Do Orcas Have Semantic Language? Machine Learning to Predict Orca
Behaviors Using Partially Labeled Vocalization Data [50.02992288349178]
We study whether machine learning can predict behavior from vocalizations.
We work with recent recordings of McMurdo Sound orcas.
With careful combination of recent machine learning techniques, we achieve 96.4% classification accuracy.
arXiv Detail & Related papers (2023-01-28T06:04:22Z) - Persistent Animal Identification Leveraging Non-Visual Markers [71.14999745312626]
We aim to locate and provide a unique identifier for each mouse in a cluttered home-cage environment through time.
This is a very challenging problem due to (i) the lack of distinguishing visual features for each mouse, and (ii) the close confines of the scene with constant occlusion.
Our approach achieves 77% accuracy on this animal identification problem, and is able to reject spurious detections when the animals are hidden.
arXiv Detail & Related papers (2021-12-13T17:11:32Z) - A deep convolutional neural network for classification of Aedes
albopictus mosquitoes [1.6758573326215689]
We introduce the application of two Deep Convolutional Neural Networks in a comparative study to automate the classification task.
We use the transfer learning principle to train two state-of-the-art architectures on the data provided by the Mosquito Alert project.
In addition, we applied explainable models based on the Grad-CAM algorithm to visualise the most discriminant regions of the classified images.
arXiv Detail & Related papers (2021-10-29T17:58:32Z) - On the use of uncertainty in classifying Aedes Albopictus mosquitoes [1.6758573326215689]
Convolutional neural networks (CNNs) have been used by several studies to recognise mosquitoes in images.
This paper proposes using the Monte Carlo Dropout method to estimate the uncertainty scores in order to rank the classified samples.
arXiv Detail & Related papers (2021-10-29T16:58:25Z) - Project Achoo: A Practical Model and Application for COVID-19 Detection
from Recordings of Breath, Voice, and Cough [55.45063681652457]
We propose a machine learning method to quickly triage COVID-19 using recordings made on consumer devices.
The approach combines signal processing methods with fine-tuned deep learning networks and provides methods for signal denoising, cough detection and classification.
We have also developed and deployed a mobile application that uses symptoms checker together with voice, breath and cough signals to detect COVID-19 infection.
arXiv Detail & Related papers (2021-07-12T08:07:56Z) - Discriminative Singular Spectrum Classifier with Applications on
Bioacoustic Signal Recognition [67.4171845020675]
We present a bioacoustic signal classifier equipped with a discriminative mechanism to extract useful features for analysis and classification efficiently.
Unlike current bioacoustic recognition methods, which are task-oriented, the proposed model relies on transforming the input signals into vector subspaces.
The validity of the proposed method is verified using three challenging bioacoustic datasets containing anuran, bee, and mosquito species.
arXiv Detail & Related papers (2021-03-18T11:01:21Z) - Automatic Detection of Aedes aegypti Breeding Grounds Based on Deep
Networks with Spatio-Temporal Consistency [2.4858193569899907]
Aedes aegypti mosquito infects millions of people with diseases such as dengue zika, chikungunya, and urban yellow fever.
Main form to combat these diseases is to avoid mosquito reproduction by searching for and eliminating the potential mosquito breeding grounds.
In this work, we introduce a comprehensive dataset of aerial videos, acquired with an unmanned aerial vehicle, containing possible mosquito breeding sites.
arXiv Detail & Related papers (2020-07-29T14:30:54Z) - HumBug Zooniverse: a crowd-sourced acoustic mosquito dataset [5.3909333359654275]
We release a new dataset of mosquito audio recordings.
With over a thousand contributors, we obtained 195,434 labels of two second duration, of which approximately 10 percent signify mosquito events.
We present an example use of the dataset, in which we train a convolutional neural network on log-Mel features, showcasing the information content of the labels.
arXiv Detail & Related papers (2020-01-14T12:06:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.