Related papers: The iNaturalist Sounds Dataset

The iNaturalist Sounds Dataset

URL: http://arxiv.org/abs/2506.00343v1
Date: Sat, 31 May 2025 02:07:37 GMT
Title: The iNaturalist Sounds Dataset
Authors: Mustafa Chasmai, Alexander Shepard, Subhransu Maji, Grant Van Horn,
Abstract summary: iNatSounds is a collection of 230,000 audio files capturing sounds from over 5,500 species, contributed by more than 27,000 recordists worldwide.<n>The dataset encompasses sounds from birds, mammals, insects, reptiles, and amphibians, with audio and species labels derived from observations submitted to iNaturalist.<n>We envision models trained on this data powering next-generation public engagement applications, and assisting biologists, ecologists, and land use managers in processing large audio collections.
Score: 60.157076990024606
License: http://creativecommons.org/licenses/by/4.0/
Abstract: We present the iNaturalist Sounds Dataset (iNatSounds), a collection of 230,000 audio files capturing sounds from over 5,500 species, contributed by more than 27,000 recordists worldwide. The dataset encompasses sounds from birds, mammals, insects, reptiles, and amphibians, with audio and species labels derived from observations submitted to iNaturalist, a global citizen science platform. Each recording in the dataset varies in length and includes a single species annotation. We benchmark multiple backbone architectures, comparing multiclass classification objectives with multilabel objectives. Despite weak labeling, we demonstrate that iNatSounds serves as a useful pretraining resource by benchmarking it on strongly labeled downstream evaluation datasets. The dataset is available as a single, freely accessible archive, promoting accessibility and research in this important domain. We envision models trained on this data powering next-generation public engagement applications, and assisting biologists, ecologists, and land use managers in processing large audio collections, thereby contributing to the understanding of species compositions in diverse soundscapes.

Related papers

ECOSoundSet: a finely annotated dataset for the automated acoustic identification of Orthoptera and Cicadidae in North, Central and temperate Western Europe [51.82780272068934]
We present ECOSoundSet (European Cicadidae and Orthoptera Sound dataSet), a dataset containing 10,653 recordings of 200 orthopteran and 24 cicada species (217 and 26 respective taxa when including subspecies) present in North, Central, and temperate Western Europe.<n>This dataset could serve as a meaningful complement to recordings already available online for the training of deep learning algorithms for the acoustic classification of orthopterans and cicadas in North, Central, and temperate Western Europe.
arXiv Detail & Related papers (2025-04-29T13:53:33Z)
NatureLM-audio: an Audio-Language Foundation Model for Bioacoustics [22.64185462738092]
We present NatureLM-audio, the first audio-language foundation model specifically designed for bioacoustics. We demonstrate successful transfer of learned representations from music and speech to bioacoustics, and our model shows promising generalization to unseen taxa and tasks. To advance bioacoustics research, we also open-source the code for generating training and benchmark data, as well as for training the model.
arXiv Detail & Related papers (2024-11-11T18:01:45Z)
animal2vec and MeerKAT: A self-supervised transformer for rare-event raw audio input and a large-scale reference dataset for bioacoustics [2.1019401515721583]
animal2vec is an interpretable large transformer model that learns from unlabeled audio and refines its understanding with labeled data. Meerkat Audio Transcripts is the largest labeled dataset on non-human terrestrial mammals. Our model outperforms existing methods on MeerKAT and the publicly available NIPS4Bplus birdsong dataset.
arXiv Detail & Related papers (2024-06-03T12:11:01Z)
BirdSet: A Large-Scale Dataset for Audio Classification in Avian Bioacoustics [2.2399415927517414]
BirdSet is a large-scale benchmark dataset for audio classification focusing on avian bioacoustics.<n>We benchmark six well-known DL models in multi-label classification across three distinct training scenarios.<n>We host our dataset on Hugging Face for easy accessibility and offer an extensive to reproduce our results.
arXiv Detail & Related papers (2024-03-15T15:10:40Z)
Exploring Meta Information for Audio-based Zero-shot Bird Classification [113.17261694996051]
This study investigates how meta-information can improve zero-shot audio classification. We use bird species as an example case study due to the availability of rich and diverse meta-data.
arXiv Detail & Related papers (2023-09-15T13:50:16Z)
Transferable Models for Bioacoustics with Human Language Supervision [0.0]
BioLingual is a new model for bioacoustics based on contrastive language-audio pretraining. It can identify over a thousand species' calls across taxa, complete bioacoustic tasks zero-shot, and retrieve animal vocalization recordings from natural text queries.
arXiv Detail & Related papers (2023-08-09T14:22:18Z)
Self-Supervised Visual Acoustic Matching [63.492168778869726]
Acoustic matching aims to re-synthesize an audio clip to sound as if it were recorded in a target acoustic environment. We propose a self-supervised approach to visual acoustic matching where training samples include only the target scene image and audio. Our approach jointly learns to disentangle room acoustics and re-synthesize audio into the target environment, via a conditional GAN framework and a novel metric.
arXiv Detail & Related papers (2023-07-27T17:59:59Z)
Spatial Implicit Neural Representations for Global-Scale Species Mapping [72.92028508757281]
Given a set of locations where a species has been observed, the goal is to build a model to predict whether the species is present or absent at any location. Traditional methods struggle to take advantage of emerging large-scale crowdsourced datasets. We use Spatial Implicit Neural Representations (SINRs) to jointly estimate the geographical range of 47k species simultaneously.
arXiv Detail & Related papers (2023-06-05T03:36:01Z)
Cetacean Translation Initiative: a roadmap to deciphering the communication of sperm whales [97.41394631426678]
Recent research showed the promise of machine learning tools for analyzing acoustic communication in nonhuman species. We outline the key elements required for the collection and processing of massive bioacoustic data of sperm whales. The technological capabilities developed are likely to yield cross-applications and advancements in broader communities investigating non-human communication and animal behavioral research.
arXiv Detail & Related papers (2021-04-17T18:39:22Z)
Modelling Animal Biodiversity Using Acoustic Monitoring and Deep Learning [0.0]
This paper outlines an approach for achieving this using state of the art in machine learning to automatically extract features from time-series audio signals. The acquired bird songs are processed using mel-frequency cepstrum (MFC) to extract features which are later classified using a multilayer perceptron (MLP) Our proposed method achieved promising results with 0.74 sensitivity, 0.92 specificity and an accuracy of 0.74.
arXiv Detail & Related papers (2021-03-12T13:50:31Z)

This list is automatically generated from the titles and abstracts of the papers in this site.