Machine Learning-based Classification of Birds through Birdsong
- URL: http://arxiv.org/abs/2212.04684v1
- Date: Fri, 9 Dec 2022 06:20:50 GMT
- Title: Machine Learning-based Classification of Birds through Birdsong
- Authors: Yueying Chang and Richard O. Sinnott
- Abstract summary: We apply Mel Frequency Cepstral Coefficients (MFCC) in combination with a range of machine learning models to identify Australian birds.
We achieve an overall accuracy of 91% for the top-5 birds from the 30 selected as the case study.
Applying the models to more challenging and diverse audio files comprising 152 bird species, we achieve an accuracy of 58%.
- Score: 0.3908842679355254
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Audio sound recognition and classification is used for many tasks and
applications including human voice recognition, music recognition and audio
tagging. In this paper we apply Mel Frequency Cepstral Coefficients (MFCC) in
combination with a range of machine learning models to identify (Australian)
birds from publicly available audio files of their birdsong. We present
approaches used for data processing and augmentation and compare the results of
various state of the art machine learning models. We achieve an overall
accuracy of 91% for the top-5 birds from the 30 selected as the case study.
Applying the models to more challenging and diverse audio files comprising 152
bird species, we achieve an accuracy of 58%
Related papers
- AudioProtoPNet: An interpretable deep learning model for bird sound classification [1.49199020343864]
This study introduces AudioProtoPNet, an adaptation of the Prototypical Part Network (ProtoPNet) for multi-label bird sound classification.
It is an inherently interpretable model that uses a ConvNeXt backbone to extract embeddings.
The model was trained on the BirdSet training dataset, which consists of 9,734 bird species and over 6,800 hours of recordings.
arXiv Detail & Related papers (2024-04-16T09:37:41Z) - Whole-body Detection, Recognition and Identification at Altitude and
Range [57.445372305202405]
We propose an end-to-end system evaluated on diverse datasets.
Our approach involves pre-training the detector on common image datasets and fine-tuning it on BRIAR's complex videos and images.
We conduct thorough evaluations under various conditions, such as different ranges and angles in indoor, outdoor, and aerial scenarios.
arXiv Detail & Related papers (2023-11-09T20:20:23Z) - AV-SUPERB: A Multi-Task Evaluation Benchmark for Audio-Visual Representation Models [92.92233932921741]
We propose the AV-SUPERB benchmark that enables general-purpose evaluation of unimodal audio/visual and bimodal fusion representations.
We evaluate 5 recent self-supervised models and show that none of these models generalize to all tasks.
We show that representations may be improved with intermediate-task fine-tuning and audio event classification with AudioSet serves as a strong intermediate task.
arXiv Detail & Related papers (2023-09-19T17:35:16Z) - Exploring Meta Information for Audio-based Zero-shot Bird Classification [113.17261694996051]
This study investigates how meta-information can improve zero-shot audio classification.
We use bird species as an example case study due to the availability of rich and diverse meta-data.
arXiv Detail & Related papers (2023-09-15T13:50:16Z) - MERT: Acoustic Music Understanding Model with Large-Scale Self-supervised Training [74.32603591331718]
We propose an acoustic Music undERstanding model with large-scale self-supervised Training (MERT), which incorporates teacher models to provide pseudo labels in the masked language modelling (MLM) style acoustic pre-training.
Experimental results indicate that our model can generalise and perform well on 14 music understanding tasks and attain state-of-the-art (SOTA) overall scores.
arXiv Detail & Related papers (2023-05-31T18:27:43Z) - Transformer-based Sequence Labeling for Audio Classification based on
MFCCs [0.0]
This paper proposes a Transformer-encoder-based model for audio classification using MFCCs.
The model was benchmarked against the ESC-50, Speech Commands v0.02 and UrbanSound8k datasets and has shown strong performance.
The model consisted of a mere 127,544 total parameters, making it light-weight yet highly efficient at the audio classification task.
arXiv Detail & Related papers (2023-04-30T07:25:43Z) - Few-shot Long-Tailed Bird Audio Recognition [3.8073142980733]
We propose a sound detection and classification pipeline to analyze soundscape recordings.
Our solution achieved 18th place of 807 teams at the BirdCLEF 2022 Challenge hosted on Kaggle.
arXiv Detail & Related papers (2022-06-22T04:14:25Z) - Low-complexity deep learning frameworks for acoustic scene
classification [64.22762153453175]
We present low-complexity deep learning frameworks for acoustic scene classification (ASC)
The proposed frameworks can be separated into four main steps: Front-end spectrogram extraction, online data augmentation, back-end classification, and late fusion of predicted probabilities.
Our experiments conducted on DCASE 2022 Task 1 Development dataset have fullfiled the requirement of low-complexity and achieved the best classification accuracy of 60.1%.
arXiv Detail & Related papers (2022-06-13T11:41:39Z) - PSLA: Improving Audio Event Classification with Pretraining, Sampling,
Labeling, and Aggregation [19.09439093130855]
We present PSLA, a collection of training techniques that can noticeably boost the model accuracy.
We obtain a model that achieves a new state-of-the-art mean average precision (mAP) of 0.474 on AudioSet, outperforming the previous best system of 0.439.
arXiv Detail & Related papers (2021-02-02T01:00:38Z) - Audio ALBERT: A Lite BERT for Self-supervised Learning of Audio
Representation [51.37980448183019]
We propose Audio ALBERT, a lite version of the self-supervised speech representation model.
We show that Audio ALBERT is capable of achieving competitive performance with those huge models in the downstream tasks.
In probing experiments, we find that the latent representations encode richer information of both phoneme and speaker than that of the last layer.
arXiv Detail & Related papers (2020-05-18T10:42:44Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.