Related papers: Balanced Deep CCA for Bird Vocalization Detection

Balanced Deep CCA for Bird Vocalization Detection

URL: http://arxiv.org/abs/2211.09376v1
Date: Thu, 17 Nov 2022 07:09:07 GMT
Title: Balanced Deep CCA for Bird Vocalization Detection
Authors: Sumit Kumar, B. Anshuman, Linus Ruettimann, Richard H.R. Hahnloser, Vipul Arora
Abstract summary: We develop a novel self-supervised learning technique for multi-modal data. We learn (hidden) correlations between simultaneously recorded microphone (sound) signals and accelerometer (body vibration) signals.
Score: 5.635374645175903
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Event detection improves when events are captured by two different modalities rather than just one. But to train detection systems on multiple modalities is challenging, in particular when there is abundance of unlabelled data but limited amounts of labeled data. We develop a novel self-supervised learning technique for multi-modal data that learns (hidden) correlations between simultaneously recorded microphone (sound) signals and accelerometer (body vibration) signals. The key objective of this work is to learn useful embeddings associated with high performance in downstream event detection tasks when labeled data is scarce and the audio events of interest (songbird vocalizations) are sparse. We base our approach on deep canonical correlation analysis (DCCA) that suffers from event sparseness. We overcome the sparseness of positive labels by first learning a data sampling model from the labelled data and by applying DCCA on the output it produces. This method that we term balanced DCCA (b-DCCA) improves the performance of the unsupervised embeddings on the downstream supervised audio detection task compared to classsical DCCA. Because data labels are frequently imbalanced, our method might be of broad utility in low-resource scenarios.

Related papers

Resp-Agent: An Agent-Based System for Multimodal Respiratory Sound Generation and Disease Diagnosis [14.922065513695294]
Resp-Agent is an autonomous multimodal system orchestrated by a novel Active Adrial Curriculum Agent (Thinker-A$2$CA)<n>To address the representation gap, we introduce a Modality-Weaving Diagnoser that weaves EHR data with audio tokens via Strategic Global Attention.<n>To address the data gap, we design a Flow Matching Generator that adapts a text-only Large Language Model (LLM) via modality injection.
arXiv Detail & Related papers (2026-02-16T14:48:24Z)
Detecting and Rectifying Noisy Labels: A Similarity-based Approach [4.686586017523293]
Label noise in datasets could significantly damage the performance and robustness of deep neural networks (DNNs) trained on these datasets.<n>We propose post-hoc, model-agnostic noise detection and rectification methods utilizing the penultimate feature from a DNN.<n>Our idea is based on the observation that the similarity between the penultimate feature of a mislabeled data point and its true class data points is higher than that for data points from other classes.
arXiv Detail & Related papers (2025-09-28T16:41:56Z)
An accurate detection is not all you need to combat label noise in web-noisy datasets [23.020126612431746]
We show that direct estimation of the separating hyperplane can indeed offer an accurate detection of OOD samples. We propose a hybrid solution that alternates between noise detection using linear separation and a state-of-the-art (SOTA) small-loss approach.
arXiv Detail & Related papers (2024-07-08T00:21:42Z)
Pretraining Representations for Bioacoustic Few-shot Detection using Supervised Contrastive Learning [10.395255631261458]
In bioacoustic applications, most tasks come with few labelled training data, because annotating long recordings is time consuming and costly. We show that learning a rich feature extractor from scratch can be achieved by leveraging data augmentation using a supervised contrastive learning framework. We obtain an F-score of 63.46% on the validation set and 42.7% on the test set, ranking second in the DCASE challenge.
arXiv Detail & Related papers (2023-09-02T09:38:55Z)
Co-Learning Meets Stitch-Up for Noisy Multi-label Visual Recognition [70.00984078351927]
This paper focuses on reducing noise based on some inherent properties of multi-label classification and long-tailed learning under noisy cases. We propose a Stitch-Up augmentation to synthesize a cleaner sample, which directly reduces multi-label noise. A Heterogeneous Co-Learning framework is further designed to leverage the inconsistency between long-tailed and balanced distributions.
arXiv Detail & Related papers (2023-07-03T09:20:28Z)
Cross-Validation Is All You Need: A Statistical Approach To Label Noise Estimation [0.6612255136183889]
Machine learning models experience deteriorated performance when trained in the presence of noisy labels. This is particularly problematic for medical tasks, such as survival prediction. We propose two novel and straightforward label noise detection algorithms.
arXiv Detail & Related papers (2023-06-24T14:50:20Z)
CaSP: Class-agnostic Semi-Supervised Pretraining for Detection and Segmentation [60.28924281991539]
We propose a novel Class-agnostic Semi-supervised Pretraining (CaSP) framework to achieve a more favorable task-specificity balance. Using 3.6M unlabeled data, we achieve a remarkable performance gain of 4.7% over ImageNet-pretrained baseline on object detection.
arXiv Detail & Related papers (2021-12-09T14:54:59Z)
Attentive Prototypes for Source-free Unsupervised Domain Adaptive 3D Object Detection [85.11649974840758]
3D object detection networks tend to be biased towards the data they are trained on. We propose a single-frame approach for source-free, unsupervised domain adaptation of lidar-based 3D object detectors.
arXiv Detail & Related papers (2021-11-30T18:42:42Z)
S3: Supervised Self-supervised Learning under Label Noise [53.02249460567745]
In this paper we address the problem of classification in the presence of label noise. In the heart of our method is a sample selection mechanism that relies on the consistency between the annotated label of a sample and the distribution of the labels in its neighborhood in the feature space. Our method significantly surpasses previous methods on both CIFARCIFAR100 with artificial noise and real-world noisy datasets such as WebVision and ANIMAL-10N.
arXiv Detail & Related papers (2021-11-22T15:49:20Z)
Audio Tagging by Cross Filtering Noisy Labels [26.14064793686316]
We present a novel framework, named CrossFilter, to combat the noisy labels problem for audio tagging. Our method achieves state-of-the-art performance and even surpasses the ensemble models.
arXiv Detail & Related papers (2020-07-16T07:55:04Z)
Learning with Out-of-Distribution Data for Audio Classification [60.48251022280506]
We show that detecting and relabelling certain OOD instances, rather than discarding them, can have a positive effect on learning. The proposed method is shown to improve the performance of convolutional neural networks by a significant margin.
arXiv Detail & Related papers (2020-02-11T21:08:06Z)
Learning from Noisy Similar and Dissimilar Data [84.76686918337134]
We show how to learn a classifier from noisy S and D labeled data. We also show important connections between learning from such pairwise supervision data and learning from ordinary class-labeled data.
arXiv Detail & Related papers (2020-02-03T19:59:16Z)

This list is automatically generated from the titles and abstracts of the papers in this site.