Synthetic Swarm Mosquito Dataset for Acoustic Classification: A Proof of Concept
- URL: http://arxiv.org/abs/2512.12365v1
- Date: Sat, 13 Dec 2025 15:23:12 GMT
- Title: Synthetic Swarm Mosquito Dataset for Acoustic Classification: A Proof of Concept
- Authors: Thai-Duy Dinh, Minh-Luan Vo, Cuong Tuan Nguyen, Bich-Hien Vo,
- Abstract summary: Mosquito-borne diseases pose a serious global health threat, causing over 700,000 deaths annually.<n>This work introduces a proof-of-concept Synthetic Swarm Mosquito dataset for Acoustic Classification.
- Score: 1.1123754733827187
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Mosquito-borne diseases pose a serious global health threat, causing over 700,000 deaths annually. This work introduces a proof-of-concept Synthetic Swarm Mosquito Dataset for Acoustic Classification, created to simulate realistic multi-species and noisy swarm conditions. Unlike conventional datasets that require labor-intensive recording of individual mosquitoes, the synthetic approach enables scalable data generation while reducing human resource demands. Using log-mel spectrograms, we evaluated lightweight deep learning architectures for the classification of mosquito species. Experiments show that these models can effectively identify six major mosquito vectors and are suitable for deployment on embedded low-power devices. The study demonstrates the potential of synthetic swarm audio datasets to accelerate acoustic mosquito research and enable scalable real-time surveillance solutions.
Related papers
- Investigation into respiratory sound classification for an imbalanced data set using hybrid LSTM-KAN architectures [0.0]
This study investigates respiratory sound classification with a focus on mitigating pronounced class imbalance.<n>We propose a hybrid deep learning model that combines a Long Short-Term Memory (LSTM) network for sequential feature encoding with a Kolmogorov-Arnold Network (KAN) for classification.
arXiv Detail & Related papers (2026-01-07T05:37:57Z) - Automated Bioacoustic Monitoring for South African Bird Species on Unlabeled Data [1.3506669466260703]
The framework automatically extracted labeled data from available platforms for selected avian species.
The labeled data were embedded into recordings, including environmental sounds and noise, and were used to train convolutional recurrent neural network (CRNN) models.
The Adapted SED-CRNN model reached a F1 score of 0.73, demonstrating its efficiency under noisy, real-world conditions.
arXiv Detail & Related papers (2024-06-19T14:14:24Z) - Real Acoustic Fields: An Audio-Visual Room Acoustics Dataset and Benchmark [65.79402756995084]
Real Acoustic Fields (RAF) is a new dataset that captures real acoustic room data from multiple modalities.
RAF is the first dataset to provide densely captured room acoustic data.
arXiv Detail & Related papers (2024-03-27T17:59:56Z) - SOMOS: The Samsung Open MOS Dataset for the Evaluation of Neural
Text-to-Speech Synthesis [50.236929707024245]
The SOMOS dataset is the first large-scale mean opinion scores (MOS) dataset consisting of solely neural text-to-speech (TTS) samples.
It consists of 20K synthetic utterances of the LJ Speech voice, a public domain speech dataset.
arXiv Detail & Related papers (2022-04-06T18:45:20Z) - A deep convolutional neural network for classification of Aedes
albopictus mosquitoes [1.6758573326215689]
We introduce the application of two Deep Convolutional Neural Networks in a comparative study to automate the classification task.
We use the transfer learning principle to train two state-of-the-art architectures on the data provided by the Mosquito Alert project.
In addition, we applied explainable models based on the Grad-CAM algorithm to visualise the most discriminant regions of the classified images.
arXiv Detail & Related papers (2021-10-29T17:58:32Z) - On the use of uncertainty in classifying Aedes Albopictus mosquitoes [1.6758573326215689]
Convolutional neural networks (CNNs) have been used by several studies to recognise mosquitoes in images.
This paper proposes using the Monte Carlo Dropout method to estimate the uncertainty scores in order to rank the classified samples.
arXiv Detail & Related papers (2021-10-29T16:58:25Z) - Learning with Noisy Labels Revisited: A Study Using Real-World Human
Annotations [54.400167806154535]
Existing research on learning with noisy labels mainly focuses on synthetic label noise.
This work presents two new benchmark datasets (CIFAR-10N, CIFAR-100N)
We show that real-world noisy labels follow an instance-dependent pattern rather than the classically adopted class-dependent ones.
arXiv Detail & Related papers (2021-10-22T22:42:11Z) - HumBugDB: A Large-scale Acoustic Mosquito Dataset [15.108701811353097]
This paper presents the first large-scale multi-species dataset of acoustic recordings of mosquitoes tracked continuously in free flight.
We present 20 hours of audio recordings that we have expertly labelled and tagged precisely in time.
18 hours of recordings contain annotations from 36 different species.
arXiv Detail & Related papers (2021-10-14T14:18:17Z) - A Lottery Ticket Hypothesis Framework for Low-Complexity Device-Robust
Neural Acoustic Scene Classification [78.04177357888284]
We propose a novel neural model compression strategy combining data augmentation, knowledge transfer, pruning, and quantization for device-robust acoustic scene classification (ASC)
We report an efficient joint framework for low-complexity multi-device ASC, called Acoustic Lottery.
arXiv Detail & Related papers (2021-07-03T16:25:24Z) - Discriminative Singular Spectrum Classifier with Applications on
Bioacoustic Signal Recognition [67.4171845020675]
We present a bioacoustic signal classifier equipped with a discriminative mechanism to extract useful features for analysis and classification efficiently.
Unlike current bioacoustic recognition methods, which are task-oriented, the proposed model relies on transforming the input signals into vector subspaces.
The validity of the proposed method is verified using three challenging bioacoustic datasets containing anuran, bee, and mosquito species.
arXiv Detail & Related papers (2021-03-18T11:01:21Z) - Adaptive Multi-View ICA: Estimation of noise levels for optimal
inference [65.94843987207445]
Adaptive multiView ICA (AVICA) is a noisy ICA model where each view is a linear mixture of shared independent sources with additive noise on the sources.
On synthetic data, AVICA yields better sources estimates than other group ICA methods thanks to its explicit MMSE estimator.
On real magnetoencephalograpy (MEG) data, we provide evidence that the decomposition is less sensitive to sampling noise and that the noise variance estimates are biologically plausible.
arXiv Detail & Related papers (2021-02-22T13:10:12Z) - HumBug Zooniverse: a crowd-sourced acoustic mosquito dataset [5.3909333359654275]
We release a new dataset of mosquito audio recordings.
With over a thousand contributors, we obtained 195,434 labels of two second duration, of which approximately 10 percent signify mosquito events.
We present an example use of the dataset, in which we train a convolutional neural network on log-Mel features, showcasing the information content of the labels.
arXiv Detail & Related papers (2020-01-14T12:06:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.