ECOSoundSet: a finely annotated dataset for the automated acoustic identification of Orthoptera and Cicadidae in North, Central and temperate Western Europe
- URL: http://arxiv.org/abs/2504.20776v1
- Date: Tue, 29 Apr 2025 13:53:33 GMT
- Title: ECOSoundSet: a finely annotated dataset for the automated acoustic identification of Orthoptera and Cicadidae in North, Central and temperate Western Europe
- Authors: David Funosas, Elodie Massol, Yves Bas, Svenja Schmidt, Dominik Arend, Alexander Gebhard, Luc Barbaro, Sebastian König, Rafael Carbonell Font, David Sannier, Fernand Deroussen, Jérôme Sueur, Christian Roesti, Tomi Trilar, Wolfgang Forstmeier, Lucas Roger, Eloïsa Matheu, Piotr Guzik, Julien Barataud, Laurent Pelozuelo, Stéphane Puissant, Sandra Mueller, Björn Schuller, Jose M. Montoya, Andreas Triantafyllopoulos, Maxime Cauchoix,
- Abstract summary: We present ECOSoundSet (European Cicadidae and Orthoptera Sound dataSet), a dataset containing 10,653 recordings of 200 orthopteran and 24 cicada species (217 and 26 respective taxa when including subspecies) present in North, Central, and temperate Western Europe.<n>This dataset could serve as a meaningful complement to recordings already available online for the training of deep learning algorithms for the acoustic classification of orthopterans and cicadas in North, Central, and temperate Western Europe.
- Score: 51.82780272068934
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Currently available tools for the automated acoustic recognition of European insects in natural soundscapes are limited in scope. Large and ecologically heterogeneous acoustic datasets are currently needed for these algorithms to cross-contextually recognize the subtle and complex acoustic signatures produced by each species, thus making the availability of such datasets a key requisite for their development. Here we present ECOSoundSet (European Cicadidae and Orthoptera Sound dataSet), a dataset containing 10,653 recordings of 200 orthopteran and 24 cicada species (217 and 26 respective taxa when including subspecies) present in North, Central, and temperate Western Europe (Andorra, Belgium, Denmark, mainland France and Corsica, Germany, Ireland, Luxembourg, Monaco, Netherlands, United Kingdom, Switzerland), collected partly through targeted fieldwork in South France and Catalonia and partly through contributions from various European entomologists. The dataset is composed of a combination of coarsely labeled recordings, for which we can only infer the presence, at some point, of their target species (weak labeling), and finely annotated recordings, for which we know the specific time and frequency range of each insect sound present in the recording (strong labeling). We also provide a train/validation/test split of the strongly labeled recordings, with respective approximate proportions of 0.8, 0.1 and 0.1, in order to facilitate their incorporation in the training and evaluation of deep learning algorithms. This dataset could serve as a meaningful complement to recordings already available online for the training of deep learning algorithms for the acoustic classification of orthopterans and cicadas in North, Central, and temperate Western Europe.
Related papers
- Semi-supervised classification of bird vocalizations [0.0]
Changes in bird populations can indicate broader changes in ecosystems.<n>We propose a semi-supervised acoustic bird detector to allow the detection of time-overlapping calls.<n>It achieves a mean F0.5 score of 0.701 across 315 classes from 110 bird species on a hold-out test set.
arXiv Detail & Related papers (2025-02-19T05:31:13Z) - NBM: an Open Dataset for the Acoustic Monitoring of Nocturnal Migratory Birds in Europe [0.0]
This work presents the Nocturnal Bird Migration dataset, a collection of 13,359 annotated vocalizations from 117 species of the Western Palearctic.<n>The dataset includes precise time and frequency annotations, gathered by dozens of bird enthusiasts across France.<n>In particular, we prove the utility of this database by training an original two-stage deep object detection model tailored for the processing of audio data.
arXiv Detail & Related papers (2024-12-04T18:55:45Z) - Automated Bioacoustic Monitoring for South African Bird Species on Unlabeled Data [1.3506669466260703]
The framework automatically extracted labeled data from available platforms for selected avian species.
The labeled data were embedded into recordings, including environmental sounds and noise, and were used to train convolutional recurrent neural network (CRNN) models.
The Adapted SED-CRNN model reached a F1 score of 0.73, demonstrating its efficiency under noisy, real-world conditions.
arXiv Detail & Related papers (2024-06-19T14:14:24Z) - BirdSet: A Large-Scale Dataset for Audio Classification in Avian Bioacoustics [2.2399415927517414]
textttBirdSet is a large-scale benchmark dataset for audio classification focusing on avian bioacoustics.
textttBirdSet surpasses AudioSet with over 6,800 recording hours($uparrow!17%$) from nearly 10,000 classes($uparrow!18times$) for training and more than 400 hours($uparrow!7times$) across eight strongly labeled evaluation datasets.
arXiv Detail & Related papers (2024-03-15T15:10:40Z) - Exploring Meta Information for Audio-based Zero-shot Bird Classification [113.17261694996051]
This study investigates how meta-information can improve zero-shot audio classification.
We use bird species as an example case study due to the availability of rich and diverse meta-data.
arXiv Detail & Related papers (2023-09-15T13:50:16Z) - Spatial Implicit Neural Representations for Global-Scale Species Mapping [72.92028508757281]
Given a set of locations where a species has been observed, the goal is to build a model to predict whether the species is present or absent at any location.
Traditional methods struggle to take advantage of emerging large-scale crowdsourced datasets.
We use Spatial Implicit Neural Representations (SINRs) to jointly estimate the geographical range of 47k species simultaneously.
arXiv Detail & Related papers (2023-06-05T03:36:01Z) - Low-complexity deep learning frameworks for acoustic scene
classification [64.22762153453175]
We present low-complexity deep learning frameworks for acoustic scene classification (ASC)
The proposed frameworks can be separated into four main steps: Front-end spectrogram extraction, online data augmentation, back-end classification, and late fusion of predicted probabilities.
Our experiments conducted on DCASE 2022 Task 1 Development dataset have fullfiled the requirement of low-complexity and achieved the best classification accuracy of 60.1%.
arXiv Detail & Related papers (2022-06-13T11:41:39Z) - Learning with Noisy Labels Revisited: A Study Using Real-World Human
Annotations [54.400167806154535]
Existing research on learning with noisy labels mainly focuses on synthetic label noise.
This work presents two new benchmark datasets (CIFAR-10N, CIFAR-100N)
We show that real-world noisy labels follow an instance-dependent pattern rather than the classically adopted class-dependent ones.
arXiv Detail & Related papers (2021-10-22T22:42:11Z) - HumBugDB: A Large-scale Acoustic Mosquito Dataset [15.108701811353097]
This paper presents the first large-scale multi-species dataset of acoustic recordings of mosquitoes tracked continuously in free flight.
We present 20 hours of audio recordings that we have expertly labelled and tagged precisely in time.
18 hours of recordings contain annotations from 36 different species.
arXiv Detail & Related papers (2021-10-14T14:18:17Z) - Training Classifiers that are Universally Robust to All Label Noise
Levels [91.13870793906968]
Deep neural networks are prone to overfitting in the presence of label noise.
We propose a distillation-based framework that incorporates a new subcategory of Positive-Unlabeled learning.
Our framework generally outperforms at medium to high noise levels.
arXiv Detail & Related papers (2021-05-27T13:49:31Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.