Hybrid Disagreement-Diversity Active Learning for Bioacoustic Sound Event Detection
- URL: http://arxiv.org/abs/2505.20956v2
- Date: Wed, 28 May 2025 20:06:51 GMT
- Title: Hybrid Disagreement-Diversity Active Learning for Bioacoustic Sound Event Detection
- Authors: Shiqi Zhang, Tuomas Virtanen,
- Abstract summary: We introduce the mismatch-first farthest-traversal (MFFT), an active learning method integrating committee voting disagreement and diversity analysis.<n>MFFT achieves a mAP of 68% when cold-starting and 71% when warm-starting while using only 2.3% of the annotations.<n> Notably, MFFT excels in cold-start scenarios and with rare species, which are critical for monitoring endangered species, demonstrating its practical value.
- Score: 9.16288808621826
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Bioacoustic sound event detection (BioSED) is crucial for biodiversity conservation but faces practical challenges during model development and training: limited amounts of annotated data, sparse events, species diversity, and class imbalance. To address these challenges efficiently with a limited labeling budget, we apply the mismatch-first farthest-traversal (MFFT), an active learning method integrating committee voting disagreement and diversity analysis. We also refine an existing BioSED dataset specifically for evaluating active learning algorithms. Experimental results demonstrate that MFFT achieves a mAP of 68% when cold-starting and 71% when warm-starting (which is close to the fully-supervised mAP of 75%) while using only 2.3% of the annotations. Notably, MFFT excels in cold-start scenarios and with rare species, which are critical for monitoring endangered species, demonstrating its practical value.
Related papers
- Audio-Visual Class-Incremental Learning for Fish Feeding intensity Assessment in Aquaculture [29.42598968673262]
Fish Feeding Intensity Assessment (FFIA) is crucial in industrial aquaculture management.<n>Recent multi-modal approaches have shown promise in improving FFIA robustness and efficiency.<n>We first introduce AV-CIL-FFIA, a new dataset comprising 81,932 labelled audio-visual clips capturing feeding intensities across six different fish species in real aquaculture environments.<n>Then, we pioneer audio-visual class incremental learning (CIL) for FFIA and demonstrate through benchmarking on AV-CIL-FFIA that it significantly outperforms single-modality methods.
arXiv Detail & Related papers (2025-04-21T15:24:34Z) - Towards Deep Active Learning in Avian Bioacoustics [1.7522552085069194]
Active learning (AL) reduces annotation cost and speed up adaptions to diverse scenarios by querying the most informative instances for labeling.
This paper outlines a deep AL approach, introduces key challenges, and conducts a small-scale pilot study.
arXiv Detail & Related papers (2024-06-26T08:43:05Z) - Multitask frame-level learning for few-shot sound event detection [46.32294691870714]
This paper focuses on few-shot Sound Event Detection (SED), which aims to automatically recognize and classify sound events with limited samples.
We introduce an innovative multitask frame-level SED framework and TimeFilterAug, a linear timing mask for data augmentation.
The proposed method achieves a F-score of 63.8%, securing the 1st rank in the few-shot bioacoustic event detection category.
arXiv Detail & Related papers (2024-03-17T05:00:40Z) - Learning with Imbalanced Noisy Data by Preventing Bias in Sample
Selection [82.43311784594384]
Real-world datasets contain not only noisy labels but also class imbalance.
We propose a simple yet effective method to address noisy labels in imbalanced datasets.
arXiv Detail & Related papers (2024-02-17T10:34:53Z) - Improved Algorithm for Deep Active Learning under Imbalance via Optimal Separation [15.571923343398657]
Class imbalance severely impacts machine learning performance on minority classes in real-world applications.<n>We introduce DIRECT, an algorithm that identifies class separation boundaries and selects the most uncertain nearby examples for annotation.<n>Our work presents the first comprehensive study of active learning under both class imbalance and label noise.
arXiv Detail & Related papers (2023-12-14T18:18:34Z) - When Measures are Unreliable: Imperceptible Adversarial Perturbations
toward Top-$k$ Multi-Label Learning [83.8758881342346]
A novel loss function is devised to generate adversarial perturbations that could achieve both visual and measure imperceptibility.
Experiments on large-scale benchmark datasets demonstrate the superiority of our proposed method in attacking the top-$k$ multi-label systems.
arXiv Detail & Related papers (2023-07-27T13:18:47Z) - Active Learning with Contrastive Pre-training for Facial Expression
Recognition [19.442685015494316]
We study 8 recent active learning methods on three public FER datasets.
Our findings show that existing active learning methods do not perform well in the context of FER.
We propose contrastive self-supervised pre-training, which first learns the underlying representations based on the entire unlabelled dataset.
arXiv Detail & Related papers (2023-07-06T03:08:03Z) - ASPEST: Bridging the Gap Between Active Learning and Selective
Prediction [56.001808843574395]
Selective prediction aims to learn a reliable model that abstains from making predictions when uncertain.
Active learning aims to lower the overall labeling effort, and hence human dependence, by querying the most informative examples.
In this work, we introduce a new learning paradigm, active selective prediction, which aims to query more informative samples from the shifted target domain.
arXiv Detail & Related papers (2023-04-07T23:51:07Z) - PercentMatch: Percentile-based Dynamic Thresholding for Multi-Label
Semi-Supervised Classification [64.39761523935613]
We propose a percentile-based threshold adjusting scheme to dynamically alter the score thresholds of positive and negative pseudo-labels for each class during the training.
We achieve strong performance on Pascal VOC2007 and MS-COCO datasets when compared to recent SSL methods.
arXiv Detail & Related papers (2022-08-30T01:27:48Z) - Scale-Equivalent Distillation for Semi-Supervised Object Detection [57.59525453301374]
Recent Semi-Supervised Object Detection (SS-OD) methods are mainly based on self-training, generating hard pseudo-labels by a teacher model on unlabeled data as supervisory signals.
We analyze the challenges these methods meet with the empirical experiment results.
We introduce a novel approach, Scale-Equivalent Distillation (SED), which is a simple yet effective end-to-end knowledge distillation framework robust to large object size variance and class imbalance.
arXiv Detail & Related papers (2022-03-23T07:33:37Z) - Adaptive Affinity Loss and Erroneous Pseudo-Label Refinement for Weakly
Supervised Semantic Segmentation [48.294903659573585]
In this paper, we propose to embed affinity learning of multi-stage approaches in a single-stage model.
A deep neural network is used to deliver comprehensive semantic information in the training phase.
Experiments are conducted on the PASCAL VOC 2012 dataset to evaluate the effectiveness of our proposed approach.
arXiv Detail & Related papers (2021-08-03T07:48:33Z) - Towards Reducing Labeling Cost in Deep Object Detection [61.010693873330446]
We propose a unified framework for active learning, that considers both the uncertainty and the robustness of the detector.
Our method is able to pseudo-label the very confident predictions, suppressing a potential distribution drift.
arXiv Detail & Related papers (2021-06-22T16:53:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.